GPU compute,billed by the task
Save your AI template once, run it through a simple API. Predictable per-task pricing — no GPU rentals, no subscriptions.
Why Choose RunKey.ai
We handle the full complexity of GPU infrastructure so you can focus on building AI applications
One GPU, One Rate
Every task runs on an NVIDIA RTX 4090. No GPU model selector, no premium tier surcharge — just one card, one fixed per-unit rate across the entire platform.
Fixed 60-Second Billing Units
Tasks bill at a flat 20 credits per 60-second unit of actual GPU time, rounded up to the next unit. Same hardware, same rate, every time. Failed tasks don't consume credits.
Template-Native APIs
Publish a reusable compute template, then invoke it through a simple API — perfect for product teams, internal tools, and customer-facing workloads.
Built For Operations
Automatic scheduling, capacity balancing, and failover keep workloads moving even when demand spikes or nodes degrade.
Stop Overpaying for GPU Rentals
Self-hosted GPU vs RunKey.ai cost comparison
Self-Hosted RTX 4090 Server
RunKey.ai GPU Cloud Compute
Enterprise-Grade Elastic Compute Infrastructure
Intelligent scheduling, self-healing nodes, elastic scaling — GPU compute that never stops
Elastic Auto-Scaling
Automatically scale GPU nodes up or down based on load — instant scaling during peaks, automatic release when idle
Automatic Node Failover
When any node fails, the platform automatically detects and replaces it with a healthy node — zero task interruption
Intelligent Scheduling Engine
Routes each task to the closest available RTX 4090 node based on queue depth and load — no manual placement, no GPU selection
Sub-Second Task Launch
Pre-warmed resource pools and rapid scheduling — tasks begin execution within an average of 2 seconds after submission
Full-Stack Monitoring
Real-time monitoring of GPU utilization, task queues, and node health with automatic anomaly alerting
High-Availability SLA
99.9% uptime commitment with multi-region redundant deployments ensuring business continuity
Get Started in Four Simple Steps
Create Your Template
Start in the console by creating a compute template, adding its name, description, and the basic information your team needs to manage it.
Upload And Configure
Upload your template definition and define the input, output, concurrency, and runtime settings needed for production use.
Submit And Publish
Submit the template for review. RunKey validates the per-unit rate, max execution timeout, and configuration before publishing.
Get API Key And Invoke
Create your API key, invoke your template through a single API call, and let RunKey allocate a GPU per task — execution and result delivery handled automatically.
GPU Compute in Just a Few Lines of Code
Developer-Friendly RESTful API
No GPU programming expertise required. No CUDA installation needed. One HTTP request, and RunKey.ai handles all GPU scheduling and inference computation for you.
- Standard RESTful API, callable from any language
- Automatic GPU scheduling — no hardware selection needed
- Asynchronous task processing with long-running support
- Webhook auto-delivery of results
- Comprehensive error codes and retry mechanisms
- Official Python / Node.js SDKs
# 1. Submit an AI task to the GPU cluster
curl -X POST https://api.runkey.ai/v1/templates/tpl_portrait_fix/run \
-H "Authorization: Bearer rk_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"inputs": {
"source_image": "https://your-server.com/photo.jpg",
"quality": "high"
}
}'
# Response: Task submitted, GPU is processing
# {"task_id": "task_abc123", "status": "queued", "estimated_time": 10}
# 2. Query task result
curl https://api.runkey.ai/v1/tasks/task_abc123 \
-H "Authorization: Bearer rk_live_xxxxx"
# Response: Processing complete
# {"status": "completed", "output": {"result_url": "https://cdn.runkey.ai/..."}}Powered by NVIDIA RTX 4090
We standardized on a single GPU on purpose. One card means one fixed per-unit rate, and a quote you can run in your head before you ever invoke a task.
NVIDIA RTX 4090
Ada Lovelace · 24GB GDDR6X
One GPU, one rate
Every task on every account runs on the same NVIDIA RTX 4090 — no premium tier surcharge, no GPU selector.
Fully predictable cost
Same hardware every run means same throughput. 20 credits per 60-second unit, always.
No tier-shopping
Skip the spec-sheet decision. Build your template, ship it, and let the scheduler place every task.
Same rate for every workload, every account, every task. Bulk credit packs only change the per-credit dollar price — never the per-unit rate.
Transparent Pay-As-You-Go Pricing
Pay in 60-second units of actual GPU time — only on successful tasks, never for idle infrastructure
Earn 30% when others run your templates
Publish a template once and let other RunKey users invoke it. Every external run pays you 30% of the credits spent — automatically tracked, and paid out in USDC on Polygon straight from the console. No application, no separate contract; revenue share is on by default for every published template.
30% Of Every External Run
When another account invokes your published template, 30% of the credits they spend become creator earnings. Your own runs of your own templates are excluded.
Tracked In Real Time
The console's Earnings page breaks revenue down per-template, separates pending vs. available balance, and lists every payout you've ever requested.
Withdraw From $200
Once your available balance reaches $200, request a payout in USDC on the Polygon network. No subscription, no monthly fee — just earnings.
Frequently Asked Questions
The most common questions from teams evaluating task-based GPU infrastructure for production workloads.
Run GPU workloads with predictable economics
Infrastructure-grade execution for AI products, internal tooling, and enterprise pipelines.
View DocumentationOn-demand compute · 60-second billing units · Capped per task