RTX 4090 GPU · Fixed Per-Unit Pricing

GPU compute,billed by the task

Save your AI template once, run it through a simple API. Predictable per-task pricing — no GPU rentals, no subscriptions.

RTX 4090
Single GPU, every task
20 / unit
Fixed credits per 60s
99.9%
Platform SLA

Why Choose RunKey.ai

We handle the full complexity of GPU infrastructure so you can focus on building AI applications

One GPU, One Rate

Every task runs on an NVIDIA RTX 4090. No GPU model selector, no premium tier surcharge — just one card, one fixed per-unit rate across the entire platform.

Fixed 60-Second Billing Units

Tasks bill at a flat 20 credits per 60-second unit of actual GPU time, rounded up to the next unit. Same hardware, same rate, every time. Failed tasks don't consume credits.

Template-Native APIs

Publish a reusable compute template, then invoke it through a simple API — perfect for product teams, internal tools, and customer-facing workloads.

Built For Operations

Automatic scheduling, capacity balancing, and failover keep workloads moving even when demand spikes or nodes degrade.

Stop Overpaying for GPU Rentals

Self-hosted GPU vs RunKey.ai cost comparison

Self-Hosted RTX 4090 Server

Dedicated RTX 4090 rental $0.79/hour
Monthly minimum cost $569/month
Requires DevOps maintenance
Manual environment setup
Manual scaling management
Paying even when idle
Average monthly cost: $569 - $1,500+
Recommended

RunKey.ai GPU Cloud Compute

RTX 4090 on-demand allocation Per-task
Minimum spend $0/month
Zero maintenance, fully managed
Upload templates and go
Automatic elastic scaling
Pay only for successful tasks
99.9% SLA guarantee
Average monthly cost: Pay-as-you-go
View Pricing

Enterprise-Grade Elastic Compute Infrastructure

Intelligent scheduling, self-healing nodes, elastic scaling — GPU compute that never stops

Elastic Auto-Scaling

Automatically scale GPU nodes up or down based on load — instant scaling during peaks, automatic release when idle

Automatic Node Failover

When any node fails, the platform automatically detects and replaces it with a healthy node — zero task interruption

Intelligent Scheduling Engine

Routes each task to the closest available RTX 4090 node based on queue depth and load — no manual placement, no GPU selection

Sub-Second Task Launch

Pre-warmed resource pools and rapid scheduling — tasks begin execution within an average of 2 seconds after submission

Full-Stack Monitoring

Real-time monitoring of GPU utilization, task queues, and node health with automatic anomaly alerting

High-Availability SLA

99.9% uptime commitment with multi-region redundant deployments ensuring business continuity

Get Started in Four Simple Steps

1

Create Your Template

Start in the console by creating a compute template, adding its name, description, and the basic information your team needs to manage it.

2

Upload And Configure

Upload your template definition and define the input, output, concurrency, and runtime settings needed for production use.

3

Submit And Publish

Submit the template for review. RunKey validates the per-unit rate, max execution timeout, and configuration before publishing.

4

Get API Key And Invoke

Create your API key, invoke your template through a single API call, and let RunKey allocate a GPU per task — execution and result delivery handled automatically.

GPU Compute in Just a Few Lines of Code

Developer-Friendly RESTful API

No GPU programming expertise required. No CUDA installation needed. One HTTP request, and RunKey.ai handles all GPU scheduling and inference computation for you.

  • Standard RESTful API, callable from any language
  • Automatic GPU scheduling — no hardware selection needed
  • Asynchronous task processing with long-running support
  • Webhook auto-delivery of results
  • Comprehensive error codes and retry mechanisms
  • Official Python / Node.js SDKs
View Full Documentation
# 1. Submit an AI task to the GPU cluster
curl -X POST https://api.runkey.ai/v1/templates/tpl_portrait_fix/run \
  -H "Authorization: Bearer rk_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "source_image": "https://your-server.com/photo.jpg",
      "quality": "high"
    }
  }'

# Response: Task submitted, GPU is processing
# {"task_id": "task_abc123", "status": "queued", "estimated_time": 10}

# 2. Query task result
curl https://api.runkey.ai/v1/tasks/task_abc123 \
  -H "Authorization: Bearer rk_live_xxxxx"

# Response: Processing complete
# {"status": "completed", "output": {"result_url": "https://cdn.runkey.ai/..."}}
Single GPU · Fixed Unit Price

Powered by NVIDIA RTX 4090

We standardized on a single GPU on purpose. One card means one fixed per-unit rate, and a quote you can run in your head before you ever invoke a task.

Standard ComputeOnly Option

NVIDIA RTX 4090

Ada Lovelace · 24GB GDDR6X

VRAM
24 GB
GDDR6X
Memory Bandwidth
1.01 TB/s
Peak FP32
82.6 TFLOPS
Tensor (FP16)
1,321 TFLOPS
with sparsity
Typical Workloads
Image generation (SDXL, Flux, ControlNet)
Short-form video generation and frame interpolation
LLM inference up to ~13B parameters
Voice cloning, TTS, and audio models
LoRA fine-tuning and embedding pipelines

One GPU, one rate

Every task on every account runs on the same NVIDIA RTX 4090 — no premium tier surcharge, no GPU selector.

Fully predictable cost

Same hardware every run means same throughput. 20 credits per 60-second unit, always.

No tier-shopping

Skip the spec-sheet decision. Build your template, ship it, and let the scheduler place every task.

Fixed Unit Price
20 credits / 60-second unit

Same rate for every workload, every account, every task. Bulk credit packs only change the per-credit dollar price — never the per-unit rate.

Transparent Pay-As-You-Go Pricing

Pay in 60-second units of actual GPU time — only on successful tasks, never for idle infrastructure

Basic

$15.99
600 credits
Learn More
Recommended

Pro

$29.99
3,000 credits
Learn More

Max

$69.99
10,000 credits
Learn More
Creator Revenue Share

Earn 30% when others run your templates

Publish a template once and let other RunKey users invoke it. Every external run pays you 30% of the credits spent — automatically tracked, and paid out in USDC on Polygon straight from the console. No application, no separate contract; revenue share is on by default for every published template.

30% Of Every External Run

When another account invokes your published template, 30% of the credits they spend become creator earnings. Your own runs of your own templates are excluded.

Tracked In Real Time

The console's Earnings page breaks revenue down per-template, separates pending vs. available balance, and lists every payout you've ever requested.

Withdraw From $200

Once your available balance reaches $200, request a payout in USDC on the Polygon network. No subscription, no monthly fee — just earnings.

Frequently Asked Questions

The most common questions from teams evaluating task-based GPU infrastructure for production workloads.

RunKey bills in 60-second units of actual GPU execution time, rounded up to the next unit — a task running 13s and one running 40s both bill as one unit. You don't pay for idle rental hours. Costs stay predictable because every task on the same template runs in roughly the same time, and each template defines a max execution timeout — runaway tasks get auto-cut so spend can't spiral. Failed tasks don't consume credits.
Failed runs do not consume credits. This makes costs more predictable for product teams because you are paying for completed work rather than raw machine uptime.
No — there's nothing to choose. RunKey runs every task on a single hardware target: the NVIDIA RTX 4090 (24GB GDDR6X). One GPU means one fixed per-unit rate (20 credits per 60-second unit), no premium-tier surcharge, and a quote you can do in your head. Scaling, failover, and routing are still fully managed by the platform.
Yes. RunKey is built for API-driven usage. Teams can publish templates, configure parameters, generate API keys, and invoke tasks from internal systems, customer-facing products, or batch pipelines.
Yes. The platform is positioned for business and technical teams that need predictable GPU execution, template-based delivery, managed operations, and infrastructure-grade reliability without maintaining their own cluster.

Run GPU workloads with predictable economics

Infrastructure-grade execution for AI products, internal tooling, and enterprise pipelines.

View Documentation

On-demand compute · 60-second billing units · Capped per task