What it does

Stop juggling APIs.
Start shipping.

AIQ Load Manager is not a chat app. It's a background queue engine — fire off prompts and get on with your work.

⚡

Background queue processing

Prompts queue instantly and process in the background every 1.5–3 seconds. Rate limits are tracked in real time — nothing gets dropped or rejected.

Queue panel showing items in processing, complete, pending, and error states

🔀

Six intelligent routing modes

Auto, Balance, Cheapest, Fastest, Free Tier, and Manual modes direct each prompt to the right provider automatically. No manual switching.

Routing mode dropdown showing all six modes, with tier lock badges on Cheapest and Fastest

💰

Cost & token tracking

See exactly how many tokens and dollars you've spent per provider, per model, over any time period. Set budget caps to avoid surprises.

🗂️

Projects & persistent history

Organize prompts into separate projects, each with its own conversation thread. History is written to local SQLite on every turn — conversations survive app restarts exactly where you left off.

🏷️

Prompt type tags

Label each prompt with visual chip tags — Chat, Research, Code, Writing, Analysis, Image, Translate, or ⚡ Urgent. Tags drive provider selection automatically and, on Pro, lift your prompt's place in the queue. The special 🌐 Web Search tag also triggers live result injection before the AI call.

💡

Live cost & token estimation

See a live token count as you type and a ranked per-provider cost table before you hit queue. Know exactly what each prompt will cost on Claude vs. Groq vs. Gemini — before you send it.

Per-provider cost ranking table with estimated cost, model, and availability shown before queuing

⚖️

Compare mode Pro

Send the same prompt to multiple providers simultaneously. Responses arrive in parallel and display as side-by-side columns — provider name, model, full text, copy button, token count. Instantly see whether Claude, GPT-4o, or Gemini does better on your task.

Compare mode — three provider responses shown side by side with token counts and copy buttons

🌐

Real-time web search

Tag any prompt 🌐 Web Search and AIQ fetches live results before sending — then injects them into the system prompt. Works with every model, including fully local ones. No tool-calling support needed. Choose Tavily (1,000 free searches/month) or self-hosted SearXNG.

🔁

Auto-retry on failure

Network glitch? Temporary server error? The queue retries the prompt automatically — up to 3 times — without you lifting a finger. Permanent failures (wrong API key, spend blocked) surface immediately so you can act on them.

📋

Standing instructions

Set a global system prompt once in Settings and it's automatically prepended to every prompt you queue, across every provider. Define your preferred tone, language, output format, or any baseline rules — your own AI rulebook that travels with every request.

🎨

Response style presets

Per-provider tone and format controls — choose from Concise, Caveman (ultra-simple words), Bullet-only, ELI5, or write a fully custom instruction. Style text is appended to every prompt sent to that provider. Available on all plans.

📥

Results panel

A dedicated tab that shows every completed response as a full-text card — no expanding rows, no hunting through the queue. Search across prompts and responses, filter by project or provider, and copy any response with one click. A green badge on the nav icon counts how many results are waiting.

📂

Per-project response history

Every completed prompt for a project is stored and browseable. Click View history on any project card to see the full log of prompts, responses, provider/model used, and token counts. Available on all plans.

📄

Session digest export

Export your completed queue as a self-contained HTML file — dark theme, summary stats (items, tokens, cost), and collapsible prompt/response sections for each item. One click opens a native save dialog; the file opens immediately after saving. Starter and above.

🎯

Per-provider default model

Set a preferred model for any cloud provider so every unspecified queue item uses it automatically. Per-item model selection always overrides the default — this just sets your personal preference. Pro+ and above.

🐛

Built-in bug reporting

A Report a Bug button lives in the sidebar. One click opens a pre-filled GitHub issue with your app version and OS already populated — no copy-pasting, no hunting for version numbers.

🔒

100% local & private

Your prompts go directly to the AI provider — nothing passes through our servers. API keys are stored in your OS keychain. Anonymous usage analytics are collected to improve the app (no prompts, no keys, no personal data) — opt out any time in Settings.

🖥️

Native desktop app

Built with Electron and React. Runs natively on Windows 10/11 and macOS. Installs as a proper desktop app — no browser tab to lose.

Connectors panel showing all 12 AI providers — 5 local, 3 free cloud, 4 paid cloud

Agent Gateway

Your agent stack.
AIQ's routing.

Running agents through Hermes, n8n, or CrewAI? Point them at AIQ's local endpoint and they inherit AIQ's full routing, rate-limit queuing, cost tracking, and automatic provider fallback — without changing a line of agent code.

🔌

One config change

Change one line in your agent framework — set base_url to http://localhost:8787/v1. No code changes, no new packages, no API rewrites. Every framework that speaks the OpenAI API works instantly.

🚦

No more 429 errors

Long-running n8n workflows and multi-agent Hermes runs generate bursts of LLM calls that hit rate limits and fail silently. AIQ queues every call, waits for headroom, and retries automatically — your workflow never drops a request at 2 am.

🧭

Smart routing by model name

Pass aiq/auto, aiq/cheapest, aiq/fastest, or aiq/free as the model name and AIQ's routing engine picks the best provider. Or pass any real model name to pin a specific provider — your choice per call.

💰

Cost tracking across all agents

Every token your agents spend — Hermes, n8n, CrewAI, all of them — shows up in AIQ's Usage Dashboard. See real costs per provider, per project, and per session without adding observability tooling to each framework.

🔁

Automatic fallback

If your primary provider goes down mid-run, AIQ's router falls back to the next available provider automatically. Your agent keeps running. The switchover is invisible — the response comes back in the same format.

🔒

100% local

The gateway runs on your machine. No data goes to our servers — agent requests go from your framework, through AIQ, directly to the AI provider. Optional per-project API key prevents other local processes from using the endpoint without permission.

Compatible frameworks — change one config line

Hermes Agent · NousResearch
base_url: http://localhost:8787/v1

n8n · AI Agent node
OpenAI credential → Base URL field

OpenClaw · self-hosted
Provider config → base URL

LangGraph · LangChain
ChatOpenAI(base_url=...)

CrewAI · v1.12+
OPENAI_API_BASE in .env

AutoGen / AG2 · Microsoft
OAI_CONFIG_LIST base_url

Streaming (stream: true) fully supported. Available on all plans — free and paid. Ships in v0.6.0.

On the roadmap

Features worth building next

Informed by competitor research and user workflows — the gaps no other tool covers.

🐧 Linux native app

All tiers

The full desktop app on Linux — AppImage format runs on any distro without installation or root access, plus a .deb package for Debian and Ubuntu users. The electron-builder config is already in place. Needs a round of testing on a Linux CI runner before public release. All features parity with Windows and macOS.

Cross-provider conversation context

Starter

When the router re-routes a conversation to a different provider mid-thread — because of rate limits, cost, or availability — the stored history is automatically adapted to the new provider's format and forwarded. Your conversation keeps going, invisibly. No other tool can do this, because no other tool routes across providers in the first place.

Pro+ tier

Pro+

A tier above Pro for power users who need the full stack: Consensus mode, advanced prompt chaining, webhook output delivery, and priority support. Designed for teams and high-volume workflows. Details and pricing TBD.

Consensus mode

Pro+

Compare mode already collects every provider's response in one place. Consensus mode goes further: a meta-model synthesises the best answer from all of them, flags where providers disagree, or runs a majority-vote across outputs. One click from Compare — no separate queue, no extra setup.

Prompt template library

Starter

Save reusable prompt templates with named variables. Fill in the blanks and queue — no copy-paste gymnastics.

Prompt chaining

Pro

Use the output of one queue item as the input to the next. Build multi-step AI pipelines without writing code.

Batch CSV import

Starter

Upload a CSV of prompts and queue them all at once. Ideal for bulk content generation, testing, or data processing workflows.

Custom routing rules

Pro

Define cost and model threshold rules that override the automatic routing decision — for example: "never use Claude if estimated cost exceeds $0.02" or "always route Code prompts to GPT-4o". Builds on top of the existing 6 routing modes without replacing them.

Usage export (CSV & JSON)

Starter

Export raw usage data — token counts, costs, timestamps, provider and model — as CSV on Starter or CSV + JSON on Pro and above. Distinct from the session digest HTML export, which is a formatted report. Useful for importing into spreadsheets or external analytics tools.

Image generation

Starter

Queue image prompts to DALL-E 3, Flux, Ideogram, and Stable Diffusion (locally via ComfyUI — free). Each image is a queue item. Batch-generate dozens while you work on something else. Results auto-save to a folder you choose. Same routing and cost-tracking model you already know.

Video generation

Pro

The queue model is tailor-made for video AI. Runway, Pika, and Kling take 2–10 minutes per clip and cost real money — exactly when a managed queue with per-job cost tracking earns its keep. Submit a batch, walk away, come back to finished clips ready to download.

Webhook output delivery

Pro

POST completed responses to any URL the moment they're ready. Connect AIQ to Zapier, Make, or your own backend without polling.

Local AI — 5 providers ✓ Live

Free

Run any model locally via Ollama, LM Studio, Jan.ai, LocalAI, or llama.cpp. Zero API cost, complete privacy, fully offline. Models are discovered automatically — Llama, Mistral, Phi, Gemma, Qwen, DeepSeek-R1 and more. All server ports are configurable. v0.7.0 adds a Local / Network toggle so you can connect to a model running on another machine on your LAN or over VPN — no localhost required.

🔌 7 new cloud providers

In development

All 7 use the existing OpenAI SDK — no new npm packages needed.

Phase 1 · v0.6.0 — Fireworks AI (fastest inference, Llama/DeepSeek/Qwen) · Together AI (200+ open-source models, $25 free credit) · Cerebras (wafer-scale speed, ~1,800 tok/s, free tier) · MiniMax (MiniMax M3 — GPT-4o quality at $0.60/M input) · Cohere (enterprise instruction-following, trial key)

Phase 2 · v0.7.0 — Perplexity AI (search-grounded responses with live web citations embedded in every reply)

Phase 3 · v0.8.0 — OpenAI Codex (dedicated coding agent; reuses your existing OpenAI key — zero extra setup)

Pro+ tier — Unlimited queue & Consensus mode

Pro+ — Coming soon

Pro+ removes the 500-item queue soft cap entirely, raises cloud limits to 10,000 prompts and 20M tokens per month, adds Consensus mode (meta-model synthesis across Compare results), and includes priority email support. Built for solo power users who outgrow Pro without needing team features.

Document context injection

Pro

Attach local files — PDF, DOCX, TXT — as persistent context for a project. The file content is injected into the system prompt for every prompt in that project. Nothing leaves your machine: files are read locally and never uploaded to any server.

Email digest

Pro+ — Coming soon

Schedule automated email digests of your completed session activity — daily or weekly. Free and Starter users can already export session digests as local HTML files; Pro+ adds email delivery so you receive a formatted summary without opening the app.

Cost forecasting

Pro

Predict your monthly AI spend based on current usage trends. Get warned before you blow a budget, not after.

iOS & Android companion

Mobile

Queue prompts from your phone, receive push notifications on completion, monitor live costs. Included with Starter and Pro at no extra charge.

Scheduled-items calendar view

Starter

A week/month grid that shows all your upcoming scheduled queue items as visual blocks — click any item to preview, edit, or cancel it, and drag to reschedule. Pairs with the usage heatmap below to give you a unified past/forward view of everything in your queue, laid out on a timeline instead of a flat list.

Usage Insights panel

Pro

A new Insights sidebar panel powered entirely by your existing local SQLite data — no new infrastructure needed. Shows time-series charts of prompts/day, cost/day, and tokens/day; provider and model distribution; tag-type breakdown; and a busiest-hours heatmap so you can see exactly when and how you're using each provider.

Usage heatmap calendar

Pro

A GitHub-style contribution graph showing prompt volume and cost by day over the last 90 days. Lives inside the Insights panel alongside the scheduled-items calendar, giving you one place to see your full AI usage history at a glance — dark squares mean high-activity days, colours shift from tokens to cost.

Prompt habit analysis

Pro+

Pattern observations that surface concrete routing efficiency suggestions based on your actual usage history — for example: "You route 90% of Research prompts to Claude, but Gemini costs 4× less for that tag type." Runs entirely against local SQLite data; no prompt content is analysed externally or sent anywhere.

AI-powered prompt optimization

Pro+

A local model (Ollama or LM Studio) reviews your prompt patterns and suggests rewrites and routing changes that cut cost or improve output quality. Requires a local provider to be configured. Because analysis runs on your own hardware, no prompt content ever leaves the machine — the optimization is completely private.

Agent Gateway — local OpenAI-compatible server

All tiers · v0.6.0

AIQ spins up a local HTTP server (localhost:8787) that speaks the standard OpenAI Chat Completions API. Any AI agent framework — Hermes, OpenClaw, n8n, LangGraph, CrewAI, AutoGen, OpenAI Agents SDK — can point its base_url at AIQ and instantly inherit AIQ's full routing, rate-limit management, cost tracking, and provider fallback. No code changes on the agent side. Pass aiq/auto, aiq/cheapest, or aiq/fastest as the model name to invoke routing modes, or pass any real model name to force a specific provider. Streaming (stream: true) is fully supported. A new Gateway panel shows server status and a live request log.

Also ships with OpenRouter as a new provider — 500+ models through a single API key, using the same OpenAI-compatible SDK already in the app.

Provider health scoring

Pro · v0.6.0

Replaces binary UP / DOWN status with a rolling composite score per provider: latency (p50 & p95), error rate %, token throughput (tokens/sec), and RPM headroom. Each provider card in the Usage Dashboard shows a live gauge. The router uses the score for weighted decisions in auto mode — so a fast-but-unreliable provider scores lower than a slightly slower but rock-solid one.

Provider latency & throughput metrics

All tiers · v0.6.0

Tokens/sec and average response time displayed per provider in the Usage Dashboard — no new backend needed, derived from existing queue completion events. Cerebras and Groq are the showcase: seeing "Cerebras ~1,800 tok/s vs. GPT-4o ~45 tok/s" live makes the fastest routing mode instantly tangible.

Per-project budget allocation

Pro · v0.6.0

Scope a monthly USD budget to a project rather than to a single provider. The cap spans all providers the project uses, so "this client gets $50/month of AI" works regardless of which provider processes each item. Pairs with cost tracking and usage export for a complete per-project cost picture.

Dynamic cost-based routing

Pro · v0.7.0

A live cost/quality scoring engine that extends the existing cheapest mode and custom routing rules. Define thresholds like "max $0.03/request" or "never use Claude for Chat prompts" and the router enforces them at dispatch time, dynamically selecting the cheapest provider that meets all active rules. The cost table updates live as provider pricing changes — few competitors do this well.

SLA enforcement engine

Pro+ · v0.7.0

Define per-project SLA rules — maximum latency, maximum cost per request, minimum reliability score. When the winning provider fails an SLA check, the router falls back to the next-best provider automatically, with no user intervention. Rules are stored locally per project. This is the "Cloudflare for AI inference" differentiator — no other desktop tool enforces SLAs across providers.

Audit log & routing history

Pro · v0.7.0

An append-only local log of every routing decision: which item went where, which routing mode fired, which rule matched, and what it cost. Stored in SQLite alongside existing usage data — no new infrastructure. Gives governance-conscious users full accountability and is the foundation for a compliance reporting tier later.

🖥 Local LLM network mode

All plans · v0.7.0

Connect local AI providers (Ollama, LM Studio, Jan.ai, LocalAI, llama.cpp) to a model running on any machine on your network — a home server, a more powerful workstation, or a remote host over VPN. Each provider card gets a Local / Network toggle: Local locks to localhost (zero-config, unchanged today); Network lets you enter the host and port, shows the correct setup hint for that tool, and displays the full constructed URL. A built-in Test Connection button pings the server and confirms models are reachable before you save.

Cost allocation & chargeback export

Pro · v0.8.0

Assign a cost-center label — client, department, or team — to any project. Usage exports include the cost-center field so you can generate per-client or per-department spend reports in your own spreadsheet or BI tool. Enables chargeback billing for MSPs and freelancers without requiring a full billing engine.

Agent & MCP routing

Pro+ · v1.0

Route agentic task payloads — OpenAI Agents, LangGraph, CrewAI, MCP tool calls — through the queue the same way text prompts are routed today. The queue model already handles async workloads with retry, cost tracking, and provider fallback; this extends it to multi-step agent runs. An emerging market with very few tools doing it well.

Pricing

Simple, honest pricing.

Start free. Upgrade when you need more. No usage meters on top of your API costs.

Monthly subscription · Cancel any time · No long-term commitment

Free

For anyone who wants to run local AI or explore the queue engine. No credit card, no API key needed to start.

$0/ forever

5 local AI providers · no cloud required

Download Free

✓5 local AI providers — Ollama, LM Studio, Jan.ai, LocalAI, llama.cpp
✓Manual & Free Tier routing modes
✓Queue up to 10 items
✓Budget spend visibility (view-only)
✓Basic usage dashboard
✓1 project
✓Prompt type tags (Chat, Research, Code…)
✓Live token & cost estimate before queuing
✓⚡ Urgent tag — priority queue boost
✓100% local & private — always
–Free cloud tier (Gemini, Groq, Mistral)
–Auto / Balance / Cheapest / Fastest routing
–Budget caps & overage alerts
–Mobile companion app

Every feature, side by side.

	Free	Starter	Pro	Pro+ Soon	Team Soon
Pricing
Monthly priceAll plans are monthly subscriptions — cancel any time	$0	$9 / mo	$19 / mo	$34 / mo	$49 / user / mo
AI Providers
AI providers (total)5 local + 3 free cloud + 4 paid cloud = 12	5	8	All 12	All 12	All 12
Local AI (Ollama, LM Studio, Jan.ai, LocalAI, llama.cpp)No API key · $0 per request · fully offline	✓	✓	✓	✓	✓
Free cloud tier (Gemini, Groq, Mistral)Permanent free access — no credit card required	–	✓	✓	✓	✓
Paid cloud (Claude, OpenAI, DeepSeek, xAI Grok)	–	–	✓	✓	✓
Monthly Cloud Limits (AIQ-side caps — not provider API limits)
Monthly cloud prompt runsDirect API calls to cloud providers via AIQ	–	500/mo	2,500/mo	10,000/mo	25,000/mo (pooled)
Monthly cloud tokensCumulative tokens processed through cloud providers	–	1M/mo	5M/mo	20M/mo	60M/mo (pooled)
Media Generation
Image generationDALL-E 3, Flux, Ideogram, Stable Diffusion (local) — Roadmap	–	✓	✓	✓	✓
Video generationRunway, Pika, Kling — Roadmap	–	–	✓	✓	✓
Auto-save media output to folderRoadmap — ships with image generation	–	✓	✓	✓	✓
Queue
Max queue depth	10 items	100 items	500 (soft cap)	Unlimited	Unlimited
Background queue processing	✓	✓	✓	✓	✓
Agent Gateway — local OpenAI-compatible serverPoint Hermes, n8n, OpenClaw, LangGraph, CrewAI, AutoGen at AIQ — ships v0.6.0	v0.6.0	v0.6.0	v0.6.0	v0.6.0	v0.6.0
Real-time rate limit tracking	✓	✓	✓	✓	✓
⚡ Urgent tag — priority boostJump the queue on any plan	✓	✓	✓	✓	✓
Tag-based smart priorityAll 9 tag types boost queue position	–	–	✓	✓	✓
Batch CSV importRoadmap — Starter and above	–	✓	✓	✓	✓
Scheduled-items calendar viewWeek/month grid of upcoming queue items — Roadmap	–	✓	✓	✓	✓
Routing Modes
Manual routing	✓	✓	✓	✓	✓
Free Tier routingPrefers local providers first, then Gemini, Groq, Mistral	✓	✓	✓	✓	✓
Auto routingScores all providers dynamically	–	✓	✓	✓	✓
Balance routingRound-robins across providers	–	✓	✓	✓	✓
Cheapest routingLowest cost per token, real-time	–	–	✓	✓	✓
Fastest routingPrefers lowest-latency providers	–	–	✓	✓	✓
Custom routing rulesCost thresholds, model overrides — Roadmap	–	–	✓	✓	✓
Cost & Analytics
Basic usage dashboardToken counts, request counts	✓	✓	✓	✓	✓
Budget spend visibilitySee estimated spend per provider	View-only	View-only	✓	✓	✓
Cost tracking per provider & model	–	✓	✓	✓	✓
Budget caps & overage alerts	–	–	✓	✓	✓
Cost forecastingRoadmap	–	–	✓	✓	✓
Usage history exportCSV (Starter) · CSV + JSON (Pro+) — Roadmap	–	✓	✓	✓	✓
Usage Insights panelTime-series charts, provider distribution, tag breakdown, heatmap — Roadmap	–	–	✓	✓	✓
Usage heatmap calendarGitHub-style 90-day prompt volume & cost graph — Roadmap	–	–	✓	✓	✓
Prompt habit analysisRouting efficiency suggestions from local usage data — Roadmap	–	–	–	✓	✓
Results panelDedicated tab — full response text, search, filter by project or provider, one-click copy, green unread badge	✓	✓	✓	✓	✓

FAQ

Common questions.

Is this a chat app?

No. AIQ is a background queue engine — you fire off prompts and get on with your work while they process. It's built for people who send a lot of AI requests and need them managed, routed, and tracked, not for having a single conversation. If you want a chat interface, use Claude.ai or ChatGPT. AIQ is what sits behind your workflow.

What is the Agent Gateway?

The Agent Gateway (shipping v0.6.0) is a local HTTP server that AIQ runs on your machine at localhost:8787. It speaks the standard OpenAI Chat Completions API, so any agent framework — Hermes, n8n, CrewAI, LangGraph, AutoGen — can point its base_url at AIQ and immediately get rate-limit queuing, cost tracking, smart routing, and automatic provider fallback. One config change in your framework, nothing else to set up.

Do my prompts pass through your servers?

No. Every API call goes directly from your machine to the AI provider — Claude, OpenAI, Gemini, Groq, or whichever you've configured. AIQ is local software. We never see your prompts, your responses, or your API keys. The only data that leaves your machine is anonymous usage analytics (opt-out available in Settings), and your payment details which are handled entirely by Lemon Squeezy.

What's the difference between AIQ and OpenRouter?

OpenRouter is a cloud-hosted proxy that gives you access to hundreds of models through one endpoint — great for model breadth. AIQ runs locally, owns your queue, tracks your costs in real time, and provides a desktop UI. Starting in v0.6.0, AIQ also exposes its own OpenAI-compatible endpoint so you can use OpenRouter as a provider inside AIQ — getting both OpenRouter's model catalogue and AIQ's queue management together.

Which plan do I need for the Agent Gateway?

The Agent Gateway is available on all plans — including Free. There are no per-request fees beyond what your AI providers charge you directly. The gateway ships in v0.6.0 (August 2026).

Do I need to be a developer to use this?

Not for the core app. If you're using AIQ to queue and route prompts yourself, it's a standard desktop app — download, install, paste in your API keys, and go. The Agent Gateway does require changing one config line in your agent framework, so basic familiarity with tools like n8n or LangGraph helps there.

What happens if a provider goes down mid-run?

The queue retries the item automatically up to 3 times on transient errors (network timeouts, 5xx errors). If the provider is still unavailable, the item settles into an error state where you can retry manually with one click. Through the Agent Gateway, the router falls back to the next available provider automatically so your agent keeps running without seeing the switchover.

I already use Hermes / n8n / CrewAI. Why add AIQ?

Those frameworks handle agent logic — planning, tool use, multi-step reasoning. What they don't manage is the underlying LLM traffic: rate limits, cost tracking across providers, smart routing based on cost or speed, and automatic fallback when a provider is down. AIQ handles that layer so your agents don't have to. One config line change and every LLM call your agent makes goes through AIQ's queue.

Can I use local models with the Agent Gateway?

Yes. Local providers are first-class in AIQ and the gateway routes to them the same as any cloud provider. Pass aiq/free as the model name and AIQ prefers local providers first — fully offline, no API costs. Pass aiq/auto and the router weights local providers heavily because their cost is $0.

Is there a free tier?

Yes — free forever, no credit card required. The Free plan includes 5 local AI providers (Ollama, LM Studio, Jan.ai, LocalAI, llama.cpp), manual and free-tier routing, a 10-item queue, and the Agent Gateway when it ships. Starter ($9/mo) adds the 3 permanent free cloud tiers (Gemini, Groq, Mistral). Pro ($19/mo) unlocks all 12 providers and the full routing engine.

Capability	✦ AIQ Load Manager	LiteLLM	OpenRouter	Hermes Agent	n8n (built-in)
Desktop GUI — no terminal required	✓ Native app	CLI / config file	Web UI only	Terminal / config	Browser UI
All data stays local — no cloud sync	✓ 100% local	✓ Self-hosted	Cloud-hosted proxy	✓ Self-hosted	Cloud or self-host
Background queue with priority ordering	✓ Full queue engine	No queue UI	No queue	No queue	Workflow-level only
Rate-limit awareness + auto wait-and-resume	✓ Per-provider RPM/TPM	✓ Router-level	✓ Provider-level	Errors on 429	Fails on 429
Agent Gateway — OpenAI-compatible local endpoint	✓ v0.6.0	✓ Proxy server	Cloud endpoint only	Consumer only	Consumer only
Live cost tracking per provider + model	✓ Real-time dashboard	Logs only	Usage page	None	None
Pre-send cost estimate before queuing	✓ Live as you type	—	—	—	—
Routing modes (auto / cheapest / fastest / free tier)	✓ 6 modes	✓ Router strategies	✓ Provider routing	Single-provider focus	Manual per node
Local AI (Ollama, LM Studio, llama.cpp, etc.)	✓ 5 local providers	✓ Ollama + others	Cloud only	✓ Via Ollama	✓ Via Ollama
Compare mode — same prompt to multiple providers	✓ Side-by-side	—	—	—	—
Per-project budget caps & alerts	✓ Pro	✓ Config-level	✓ Org-level	—	—
Pricing	Free forever $9 / mo Starter $19 / mo Pro	Free OSS · $49/mo cloud	Free + usage fees	Free OSS	Free OSS · $20/mo cloud

Route smarter.Every AI. One queue.

Stop juggling APIs.Start shipping.

Background queue processing

Six intelligent routing modes

Cost & token tracking

Projects & persistent history

Prompt type tags

Live cost & token estimation

Compare mode Pro

Real-time web search

Auto-retry on failure

Standing instructions

Response style presets

Results panel

Per-project response history

Session digest export

Per-provider default model

Built-in bug reporting

100% local & private

Native desktop app

Your agent stack.AIQ's routing.

One config change

No more 429 errors

Smart routing by model name

Cost tracking across all agents

Automatic fallback

100% local

iOS & Android — yes, it works.

🖥️ Windows & macOS Live

🐧 Linux Roadmap

📱 iOS & Android Coming soon

Features worth building next

🐧 Linux native app

Cross-provider conversation context

Pro+ tier

Consensus mode

Prompt template library

Prompt chaining

Batch CSV import

Custom routing rules

Usage export (CSV & JSON)

Image generation

Video generation

Webhook output delivery

Local AI — 5 providers ✓ Live

🔌 7 new cloud providers

Pro+ tier — Unlimited queue & Consensus mode

Document context injection

Email digest

Cost forecasting

iOS & Android companion

Scheduled-items calendar view

Usage Insights panel

Usage heatmap calendar

Prompt habit analysis

AI-powered prompt optimization

Agent Gateway — local OpenAI-compatible server

Provider health scoring

Provider latency & throughput metrics

Per-project budget allocation

Dynamic cost-based routing

SLA enforcement engine

Audit log & routing history

🖥 Local LLM network mode

Cost allocation & chargeback export

Agent & MCP routing

Simple, honest pricing.

Every feature, side by side.

Why not just useLiteLLM or OpenRouter?

Common questions.

Is this a chat app?

What is the Agent Gateway?

Do my prompts pass through your servers?

What's the difference between AIQ and OpenRouter?

Which plan do I need for the Agent Gateway?

Do I need to be a developer to use this?

What happens if a provider goes down mid-run?

I already use Hermes / n8n / CrewAI. Why add AIQ?

Can I use local models with the Agent Gateway?

Is there a free tier?

Route smarter.
Every AI. One queue.

Stop juggling APIs.
Start shipping.

Your agent stack.
AIQ's routing.

Why not just use
LiteLLM or OpenRouter?