Live results · Jun 16, 2026

Cheap AI Model Tests — Everything Under $3 / Million Tokens

Picking a budget LLM usually means guessing. So we stopped guessing. We took every model priced under $3 per million output tokens that you can reach from a free trial, and sent each one the exact same prompts through the live All AI Ask API. Every number below — speed, cost, and the model outputs themselves — comes from real API calls, not marketing decks.

3
Tasks tested
19
Models
8
Providers
57
Live API runs

The three tests

{ }
Code Generation

Writing a Code Snippet

A focused coding task: produce a correct, efficient, 0-indexed iterative Fibonacci function in Python — and nothing but the code.

Winner: GPT-5.4 Nano (100/100)
View full results →
Copywriting

Writing a Short Paragraph

A plain-English writing task: explain what an API is to a non-technical small-business owner in 3–4 jargon-free sentences using one analogy.

Winner: GPT-5.4 Nano (97/100)
View full results →
[ ]
Structured Data

Extracting Structured Data

A structured-output task: read one sentence and return strict JSON with a string name, numeric price, and boolean stock flag — no markdown, no prose.

Winner: GPT-5.4 Nano (100/100)
View full results →

Overall leaderboard

Averaged across all 3 tasks. Accuracy is graded by the agent against each task's published criteria.

#ModelAvg accuracyAvg speedTotal cost
🥇
GPT-5.4 NanoOpenAI
9957.4 t/s$0.000274
2
Gemini 3.1 Flash LiteGoogle
9782.6 t/s$0.000339
3
CodestralMistral
96.392.4 t/s$0.000227
4
Mistral Medium 3Mistral
9635.7 t/s$0.000505
5
Llama 3.1 8BGroq
95.3274.2 t/s$0.000029
6
Mistral Small 3.1Mistral
95.377.3 t/s$0.000161
7
Llama 3.3 70BGroq
95181.9 t/s$0.000324
8
Amazon Nova MicroAmazon
91.3102.6 t/s$0.000029
9
Amazon Nova LiteAmazon
90.3110.1 t/s$0.00006
10
Ministral 8BMistral
9060 t/s$0.000061
11
Llama 4 ScoutGroq
89.7194.8 t/s$0.000101
12
DeepSeek V4 ProDeepSeek
82.780.5 t/s$0.000684
13
DeepSeek V4 FlashDeepSeek
8266.4 t/s$0.000173
14
Grok 4.3xAI
80.728.9 t/s$0.000984
15
GPT-OSS 120BGroq
80.7322.3 t/s$0.000309
16
GPT-OSS 20BGroq
80502.9 t/s$0.000205
17
GPT-OSS 120B (Cerebras)Cerebras
77.3490.4 t/s$0.000534
18
GLM 4.7 (Cerebras)Cerebras
75.7531.4 t/s$0.00511
19
Qwen 3 32BGroq
73.7351.5 t/s$0.001475

How we tested

  • The cohort: every model under $3 / million output tokens reachable from a trial account (19 models, 8 providers).
  • Identical prompts: each model received the same prompt with default settings — one model per request.
  • Real metrics: latency, token counts, and cost are returned directly by the API for each run.
  • Accuracy: graded 0–100 by the agent against each task's published criteria — the full output for every model is shown so you can check the grading yourself.

Run your own prompt across all of them

Send one prompt to every cheap model at once and watch the speed, cost, and quality side by side.

Try the live playground free →