RTX 4090 GPU · Fixed Per-Unit Pricing

GPU compute,billed by the task

Save your AI template once, run it through a simple API. Predictable per-task pricing — no GPU rentals, no subscriptions.

Explore API Docs View Pricing

RTX 4090

Single GPU, every task

20 / unit

Fixed credits per 60s

99.9%

Platform SLA

on-demand runtime

$ curl -X POST api.runkey.ai/v1/run \

-H "Authorization: Bearer rk_..." \

-d '{"template": "tpl_sr"}'

✓ GPU assigned (RTX 4090 · 24GB)

✓ Task completed 2.4s

billing mode:20 credits / 60s · fixed rate

All systems operational

SLA 99.9%

Why Choose RunKey.ai

We handle the full complexity of GPU infrastructure so you can focus on building AI applications

One GPU, One Rate

Every task runs on an NVIDIA RTX 4090. No GPU model selector, no premium tier surcharge — just one card, one fixed per-unit rate across the entire platform.

Fixed 60-Second Billing Units

Tasks bill at a flat 20 credits per 60-second unit of actual GPU time, rounded up to the next unit. Same hardware, same rate, every time. Failed tasks don't consume credits.

Template-Native APIs

Publish a reusable compute template, then invoke it through a simple API — perfect for product teams, internal tools, and customer-facing workloads.

Built For Operations

Automatic scheduling, capacity balancing, and failover keep workloads moving even when demand spikes or nodes degrade.

Stop Overpaying for GPU Rentals

Self-hosted GPU vs RunKey.ai cost comparison

Self-Hosted RTX 4090 Server

✗ Dedicated RTX 4090 rental $0.79/hour

✗ Monthly minimum cost $569/month

✗ Requires DevOps maintenance

✗ Manual environment setup

✗ Manual scaling management

✗ Paying even when idle

Average monthly cost: $569 - $1,500+

Recommended

RunKey.ai GPU Cloud Compute

✓ RTX 4090 on-demand allocation Per-task

✓ Minimum spend $0/month

✓ Zero maintenance, fully managed

✓ Upload templates and go

✓ Automatic elastic scaling

✓ Pay only for successful tasks

✓ 99.9% SLA guarantee

Average monthly cost: Pay-as-you-go

View Pricing

Enterprise-Grade Elastic Compute Infrastructure

Intelligent scheduling, self-healing nodes, elastic scaling — GPU compute that never stops

Elastic Auto-Scaling

Automatically scale GPU nodes up or down based on load — instant scaling during peaks, automatic release when idle

Automatic Node Failover

When any node fails, the platform automatically detects and replaces it with a healthy node — zero task interruption

Intelligent Scheduling Engine

Routes each task to the closest available RTX 4090 node based on queue depth and load — no manual placement, no GPU selection

Sub-Second Task Launch

Pre-warmed resource pools and rapid scheduling — tasks begin execution within an average of 2 seconds after submission

Full-Stack Monitoring

Real-time monitoring of GPU utilization, task queues, and node health with automatic anomaly alerting

High-Availability SLA

99.9% uptime commitment with multi-region redundant deployments ensuring business continuity

Get Started in Four Simple Steps

Create Your Template

Start in the console by creating a compute template, adding its name, description, and the basic information your team needs to manage it.

Upload And Configure

Upload your template definition and define the input, output, concurrency, and runtime settings needed for production use.

Submit And Publish

Submit the template for review. RunKey validates the per-unit rate, max execution timeout, and configuration before publishing.

Get API Key And Invoke

Create your API key, invoke your template through a single API call, and let RunKey allocate a GPU per task — execution and result delivery handled automatically.

GPU Compute in Just a Few Lines of Code

Developer-Friendly RESTful API

No GPU programming expertise required. No CUDA installation needed. One HTTP request, and RunKey.ai handles all GPU scheduling and inference computation for you.

Standard RESTful API, callable from any language
Automatic GPU scheduling — no hardware selection needed
Asynchronous task processing with long-running support
Webhook auto-delivery of results
Comprehensive error codes and retry mechanisms
Official Python / Node.js SDKs

View Full Documentation

# 1. Submit an AI task to the GPU cluster
curl -X POST https://api.runkey.ai/v1/templates/tpl_portrait_fix/run \
  -H "Authorization: Bearer rk_live_xxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "source_image": "https://your-server.com/photo.jpg",
      "quality": "high"
    }
  }'

# Response: Task submitted, GPU is processing
# {"task_id": "task_abc123", "status": "queued", "estimated_time": 10}

# 2. Query task result
curl https://api.runkey.ai/v1/tasks/task_abc123 \
  -H "Authorization: Bearer rk_live_xxxxx"

# Response: Processing complete
# {"status": "completed", "output": {"result_url": "https://cdn.runkey.ai/..."}}

Single GPU · Fixed Unit Price

Powered by NVIDIA RTX 4090

We standardized on a single GPU on purpose. One card means one fixed per-unit rate, and a quote you can run in your head before you ever invoke a task.

Standard ComputeOnly Option

NVIDIA RTX 4090

Ada Lovelace · 24GB GDDR6X

VRAM

24 GB

GDDR6X

Memory Bandwidth

1.01 TB/s

Peak FP32

82.6 TFLOPS

Tensor (FP16)

1,321 TFLOPS

with sparsity

Typical Workloads

›Image generation (SDXL, Flux, ControlNet)

›Short-form video generation and frame interpolation

›LLM inference up to ~13B parameters

›Voice cloning, TTS, and audio models

›LoRA fine-tuning and embedding pipelines

One GPU, one rate

Every task on every account runs on the same NVIDIA RTX 4090 — no premium tier surcharge, no GPU selector.

Fully predictable cost

Same hardware every run means same throughput. 20 credits per 60-second unit, always.

No tier-shopping

Skip the spec-sheet decision. Build your template, ship it, and let the scheduler place every task.

Fixed Unit Price

20 credits / 60-second unit

Same rate for every workload, every account, every task. Bulk credit packs only change the per-credit dollar price — never the per-unit rate.

Transparent Pay-As-You-Go Pricing

Pay in 60-second units of actual GPU time — only on successful tasks, never for idle infrastructure

Basic

$15.99

600 credits

Learn More

Recommended

Pro

$29.99

3,600 credits

Learn More

Max

$69.99

9,000 credits

Learn More

View Full Pricing

Creator Revenue Share

Earn 30% when others run your templates

Publish a template once and let other RunKey users invoke it. Every external run pays you 30% of the credits spent — automatically tracked, and paid out in USDC on Polygon straight from the console. No application, no separate contract; revenue share is on by default for every published template.

30% Of Every External Run

When another account invokes your published template, 30% of the credits they spend become creator earnings. Your own runs of your own templates are excluded.

Tracked In Real Time

The console's Earnings page breaks revenue down per-template, separates pending vs. available balance, and lists every payout you've ever requested.

Withdraw From $200

Once your available balance reaches $200, request a payout in USDC on the Polygon network. No subscription, no monthly fee — just earnings.

See How Earnings Work Become a Creator

Frequently Asked Questions

The most common questions from teams evaluating task-based GPU infrastructure for production workloads.

RunKey bills in 60-second units of actual GPU execution time, rounded up to the next unit — a task running 13s and one running 40s both bill as one unit. You don't pay for idle rental hours. Costs stay predictable because every task on the same template runs in roughly the same time, and each template defines a max execution timeout — runaway tasks get auto-cut so spend can't spiral. Failed tasks don't consume credits.

Failed runs do not consume credits. This makes costs more predictable for product teams because you are paying for completed work rather than raw machine uptime.

No — there's nothing to choose. RunKey runs every task on a single hardware target: the NVIDIA RTX 4090 (24GB GDDR6X). One GPU means one fixed per-unit rate (20 credits per 60-second unit), no premium-tier surcharge, and a quote you can do in your head. Scaling, failover, and routing are still fully managed by the platform.

Yes. RunKey is built for API-driven usage. Teams can publish templates, configure parameters, generate API keys, and invoke tasks from internal systems, customer-facing products, or batch pipelines.

Yes. The platform is positioned for business and technical teams that need predictable GPU execution, template-based delivery, managed operations, and infrastructure-grade reliability without maintaining their own cluster.

Run GPU workloads with predictable economics

Infrastructure-grade execution for AI products, internal tooling, and enterprise pipelines.

View Documentation

On-demand compute · 60-second billing units · Capped per task