✦ Desktop app · Windows & macOS · 12 AI providers · Agent Gateway for Hermes, n8n, CrewAI & more

Route smarter.
Every AI. One queue.

Route smarter·Spend less·Stay in flow.
Also works as a local gateway for Hermes, n8n, OpenClaw, LangGraph, CrewAI & AutoGen — one endpoint, every provider.
Cloud Claude OpenAI Gemini Groq DeepSeek Mistral xAI Grok
Local 🖥 Ollama 🖥 LM Studio 🖥 Jan.ai 🖥 LocalAI 🖥 llama.cpp
Soon Fireworks Together Cohere MiniMax Cerebras Perplexity Codex
AIQ Load Manager — Usage Dashboard showing live rate-limit meters, token counts, and cost per provider across multiple AI providers
What it does

Stop juggling APIs.
Start shipping.

AIQ Load Manager is not a chat app. It's a background queue engine — fire off prompts and get on with your work.

Background queue processing

Prompts queue instantly and process in the background every 1.5–3 seconds. Rate limits are tracked in real time — nothing gets dropped or rejected.

Queue panel showing items in processing, complete, pending, and error states
🔀

Six intelligent routing modes

Auto, Balance, Cheapest, Fastest, Free Tier, and Manual modes direct each prompt to the right provider automatically. No manual switching.

Routing mode dropdown showing all six modes, with tier lock badges on Cheapest and Fastest
💰

Cost & token tracking

See exactly how many tokens and dollars you've spent per provider, per model, over any time period. Set budget caps to avoid surprises.

🗂️

Projects & persistent history

Organize prompts into separate projects, each with its own conversation thread. History is written to local SQLite on every turn — conversations survive app restarts exactly where you left off.

🏷️

Prompt type tags

Label each prompt with visual chip tags — Chat, Research, Code, Writing, Analysis, Image, Translate, or ⚡ Urgent. Tags drive provider selection automatically and, on Pro, lift your prompt's place in the queue. The special 🌐 Web Search tag also triggers live result injection before the AI call.

💡

Live cost & token estimation

See a live token count as you type and a ranked per-provider cost table before you hit queue. Know exactly what each prompt will cost on Claude vs. Groq vs. Gemini — before you send it.

Per-provider cost ranking table with estimated cost, model, and availability shown before queuing
⚖️

Compare mode Pro

Send the same prompt to multiple providers simultaneously. Responses arrive in parallel and display as side-by-side columns — provider name, model, full text, copy button, token count. Instantly see whether Claude, GPT-4o, or Gemini does better on your task.

Compare mode — three provider responses shown side by side with token counts and copy buttons
🌐

Real-time web search

Tag any prompt 🌐 Web Search and AIQ fetches live results before sending — then injects them into the system prompt. Works with every model, including fully local ones. No tool-calling support needed. Choose Tavily (1,000 free searches/month) or self-hosted SearXNG.

🔁

Auto-retry on failure

Network glitch? Temporary server error? The queue retries the prompt automatically — up to 3 times — without you lifting a finger. Permanent failures (wrong API key, spend blocked) surface immediately so you can act on them.

📋

Standing instructions

Set a global system prompt once in Settings and it's automatically prepended to every prompt you queue, across every provider. Define your preferred tone, language, output format, or any baseline rules — your own AI rulebook that travels with every request.

🎨

Response style presets

Per-provider tone and format controls — choose from Concise, Caveman (ultra-simple words), Bullet-only, ELI5, or write a fully custom instruction. Style text is appended to every prompt sent to that provider. Available on all plans.

📂

Per-project response history

Every completed prompt for a project is stored and browseable. Click View history on any project card to see the full log of prompts, responses, provider/model used, and token counts. Available on all plans.

📄

Session digest export

Export your completed queue as a self-contained HTML file — dark theme, summary stats (items, tokens, cost), and collapsible prompt/response sections for each item. One click opens a native save dialog; the file opens immediately after saving. Starter and above.

🎯

Per-provider default model

Set a preferred model for any cloud provider so every unspecified queue item uses it automatically. Per-item model selection always overrides the default — this just sets your personal preference. Pro+ and above.

🐛

Built-in bug reporting

A Report a Bug button lives in the sidebar. One click opens a pre-filled GitHub issue with your app version and OS already populated — no copy-pasting, no hunting for version numbers.

🔒

100% local & private

Your prompts go directly to the AI provider — nothing passes through our servers. API keys are stored in your OS keychain. Anonymous usage analytics are collected to improve the app (no prompts, no keys, no personal data) — opt out any time in Settings.

🖥️

Native desktop app

Built with Electron and React. Runs natively on Windows 10/11 and macOS. Installs as a proper desktop app — no browser tab to lose.

Connectors panel showing all 12 AI providers — 5 local, 3 free cloud, 4 paid cloud
Agent Gateway

Your agent stack.
AIQ's routing.

Running agents through Hermes, n8n, or CrewAI? Point them at AIQ's local endpoint and they inherit AIQ's full routing, rate-limit queuing, cost tracking, and automatic provider fallback — without changing a line of agent code.

🔌

One config change

Change one line in your agent framework — set base_url to http://localhost:8787/v1. No code changes, no new packages, no API rewrites. Every framework that speaks the OpenAI API works instantly.

🚦

No more 429 errors

Long-running n8n workflows and multi-agent Hermes runs generate bursts of LLM calls that hit rate limits and fail silently. AIQ queues every call, waits for headroom, and retries automatically — your workflow never drops a request at 2 am.

🧭

Smart routing by model name

Pass aiq/auto, aiq/cheapest, aiq/fastest, or aiq/free as the model name and AIQ's routing engine picks the best provider. Or pass any real model name to pin a specific provider — your choice per call.

💰

Cost tracking across all agents

Every token your agents spend — Hermes, n8n, CrewAI, all of them — shows up in AIQ's Usage Dashboard. See real costs per provider, per project, and per session without adding observability tooling to each framework.

🔁

Automatic fallback

If your primary provider goes down mid-run, AIQ's router falls back to the next available provider automatically. Your agent keeps running. The switchover is invisible — the response comes back in the same format.

🔒

100% local

The gateway runs on your machine. No data goes to our servers — agent requests go from your framework, through AIQ, directly to the AI provider. Optional per-project API key prevents other local processes from using the endpoint without permission.

Compatible frameworks — change one config line

Hermes Agent · NousResearch
base_url: http://localhost:8787/v1
n8n · AI Agent node
OpenAI credential → Base URL field
OpenClaw · self-hosted
Provider config → base URL
LangGraph · LangChain
ChatOpenAI(base_url=...)
CrewAI · v1.12+
OPENAI_API_BASE in .env
AutoGen / AG2 · Microsoft
OAI_CONFIG_LIST base_url

Streaming (stream: true) fully supported. Available on all plans — free and paid. Ships in v0.6.0.

Platform coverage

iOS & Android — yes, it works.

The queue concept translates naturally to mobile. Here's exactly what you'd get on each platform — and one honest caveat.

🖥️ Windows & macOS Live

Full background queue processing, all routing modes, unlimited queue depth (Pro), local SQLite storage, OS keychain for API keys. The complete experience.

🐧 Linux Roadmap

Full desktop app via AppImage (runs on any distro — no installation, no root required) and .deb for Debian/Ubuntu. The same complete experience as Windows and macOS. The build config is already in place.

📱 iOS & Android Coming soon

The companion app. Monitor your queue, add prompts on the go, view completed results, and receive push notifications. Mobile doesn't replace desktop — it extends it.

⚠ iOS background processing limit iOS restricts background app execution to under 30 seconds. A live queue loop isn't possible while the app is backgrounded. On iOS, the queue processes while the app is open — or desktop handles it and mobile notifies you when items complete. Android is more permissive and can support continuous background processing.
Mobile-specific features planned
  • Push notifications — get notified the moment a queued item completes, even with the app closed (Android).
  • Share sheet extension — highlight text in any app, tap Share → AIQ, and it lands in your queue instantly.
  • Home screen widget — live queue depth, cost today, and current provider status on your lock screen.
  • Clipboard monitor — optional auto-detect for copied prompts, one tap to queue them.
  • iOS Shortcuts integration — automate "add to AIQ queue" from any Shortcuts workflow.
  • Offline queue — prompts added offline sync to the queue the moment connectivity returns.
  • Secure Enclave key storage — API keys stored in hardware-backed secure storage, not just the OS keychain.
  • Mobile included in Starter & Pro — no separate purchase. One license covers desktop and mobile.
On the roadmap

Features worth building next

Informed by competitor research and user workflows — the gaps no other tool covers.

🐧 Linux native app

All tiers

The full desktop app on Linux — AppImage format runs on any distro without installation or root access, plus a .deb package for Debian and Ubuntu users. The electron-builder config is already in place. Needs a round of testing on a Linux CI runner before public release. All features parity with Windows and macOS.

Cross-provider conversation context

Starter

When the router re-routes a conversation to a different provider mid-thread — because of rate limits, cost, or availability — the stored history is automatically adapted to the new provider's format and forwarded. Your conversation keeps going, invisibly. No other tool can do this, because no other tool routes across providers in the first place.

Pro+ tier

Pro+

A tier above Pro for power users who need the full stack: Consensus mode, advanced prompt chaining, webhook output delivery, and priority support. Designed for teams and high-volume workflows. Details and pricing TBD.

Consensus mode

Pro+

Compare mode already collects every provider's response in one place. Consensus mode goes further: a meta-model synthesises the best answer from all of them, flags where providers disagree, or runs a majority-vote across outputs. One click from Compare — no separate queue, no extra setup.

Prompt template library

Starter

Save reusable prompt templates with named variables. Fill in the blanks and queue — no copy-paste gymnastics.

Prompt chaining

Pro

Use the output of one queue item as the input to the next. Build multi-step AI pipelines without writing code.

Batch CSV import

Starter

Upload a CSV of prompts and queue them all at once. Ideal for bulk content generation, testing, or data processing workflows.

Custom routing rules

Pro

Define cost and model threshold rules that override the automatic routing decision — for example: "never use Claude if estimated cost exceeds $0.02" or "always route Code prompts to GPT-4o". Builds on top of the existing 6 routing modes without replacing them.

Usage export (CSV & JSON)

Starter

Export raw usage data — token counts, costs, timestamps, provider and model — as CSV on Starter or CSV + JSON on Pro and above. Distinct from the session digest HTML export, which is a formatted report. Useful for importing into spreadsheets or external analytics tools.

Image generation

Starter

Queue image prompts to DALL-E 3, Flux, Ideogram, and Stable Diffusion (locally via ComfyUI — free). Each image is a queue item. Batch-generate dozens while you work on something else. Results auto-save to a folder you choose. Same routing and cost-tracking model you already know.

Video generation

Pro

The queue model is tailor-made for video AI. Runway, Pika, and Kling take 2–10 minutes per clip and cost real money — exactly when a managed queue with per-job cost tracking earns its keep. Submit a batch, walk away, come back to finished clips ready to download.

Webhook output delivery

Pro

POST completed responses to any URL the moment they're ready. Connect AIQ to Zapier, Make, or your own backend without polling.

Local AI — 5 providers ✓ Live

Free

Run any model locally via Ollama, LM Studio, Jan.ai, LocalAI, or llama.cpp. Zero API cost, complete privacy, fully offline. Models are discovered automatically — Llama, Mistral, Phi, Gemma, Qwen, DeepSeek-R1 and more. All server ports are configurable.

🔌 7 new cloud providers

In development

All 7 use the existing OpenAI SDK — no new npm packages needed.

Phase 1 · v0.6.0Fireworks AI (fastest inference, Llama/DeepSeek/Qwen) · Together AI (200+ open-source models, $25 free credit) · Cerebras (wafer-scale speed, ~1,800 tok/s, free tier) · MiniMax (MiniMax M3 — GPT-4o quality at $0.60/M input) · Cohere (enterprise instruction-following, trial key)

Phase 2 · v0.7.0Perplexity AI (search-grounded responses with live web citations embedded in every reply)

Phase 3 · v0.8.0OpenAI Codex (dedicated coding agent; reuses your existing OpenAI key — zero extra setup)

Pro+ tier — Unlimited queue & Consensus mode

Pro+ — Coming soon

Pro+ removes the 500-item queue soft cap entirely, raises cloud limits to 10,000 prompts and 20M tokens per month, adds Consensus mode (meta-model synthesis across Compare results), and includes priority email support. Built for solo power users who outgrow Pro without needing team features.

Document context injection

Pro

Attach local files — PDF, DOCX, TXT — as persistent context for a project. The file content is injected into the system prompt for every prompt in that project. Nothing leaves your machine: files are read locally and never uploaded to any server.

Email digest

Pro+ — Coming soon

Schedule automated email digests of your completed session activity — daily or weekly. Free and Starter users can already export session digests as local HTML files; Pro+ adds email delivery so you receive a formatted summary without opening the app.

Cost forecasting

Pro

Predict your monthly AI spend based on current usage trends. Get warned before you blow a budget, not after.

iOS & Android companion

Mobile

Queue prompts from your phone, receive push notifications on completion, monitor live costs. Included with Starter and Pro at no extra charge.

Scheduled-items calendar view

Starter

A week/month grid that shows all your upcoming scheduled queue items as visual blocks — click any item to preview, edit, or cancel it, and drag to reschedule. Pairs with the usage heatmap below to give you a unified past/forward view of everything in your queue, laid out on a timeline instead of a flat list.

Usage Insights panel

Pro

A new Insights sidebar panel powered entirely by your existing local SQLite data — no new infrastructure needed. Shows time-series charts of prompts/day, cost/day, and tokens/day; provider and model distribution; tag-type breakdown; and a busiest-hours heatmap so you can see exactly when and how you're using each provider.

Usage heatmap calendar

Pro

A GitHub-style contribution graph showing prompt volume and cost by day over the last 90 days. Lives inside the Insights panel alongside the scheduled-items calendar, giving you one place to see your full AI usage history at a glance — dark squares mean high-activity days, colours shift from tokens to cost.

Prompt habit analysis

Pro+

Pattern observations that surface concrete routing efficiency suggestions based on your actual usage history — for example: "You route 90% of Research prompts to Claude, but Gemini costs 4× less for that tag type." Runs entirely against local SQLite data; no prompt content is analysed externally or sent anywhere.

AI-powered prompt optimization

Pro+

A local model (Ollama or LM Studio) reviews your prompt patterns and suggests rewrites and routing changes that cut cost or improve output quality. Requires a local provider to be configured. Because analysis runs on your own hardware, no prompt content ever leaves the machine — the optimization is completely private.

Agent Gateway — local OpenAI-compatible server

All tiers · v0.6.0

AIQ spins up a local HTTP server (localhost:8787) that speaks the standard OpenAI Chat Completions API. Any AI agent framework — Hermes, OpenClaw, n8n, LangGraph, CrewAI, AutoGen, OpenAI Agents SDK — can point its base_url at AIQ and instantly inherit AIQ's full routing, rate-limit management, cost tracking, and provider fallback. No code changes on the agent side. Pass aiq/auto, aiq/cheapest, or aiq/fastest as the model name to invoke routing modes, or pass any real model name to force a specific provider. Streaming (stream: true) is fully supported. A new Gateway panel shows server status and a live request log.

Also ships with OpenRouter as a new provider — 500+ models through a single API key, using the same OpenAI-compatible SDK already in the app.

Provider health scoring

Pro · v0.6.0

Replaces binary UP / DOWN status with a rolling composite score per provider: latency (p50 & p95), error rate %, token throughput (tokens/sec), and RPM headroom. Each provider card in the Usage Dashboard shows a live gauge. The router uses the score for weighted decisions in auto mode — so a fast-but-unreliable provider scores lower than a slightly slower but rock-solid one.

Provider latency & throughput metrics

All tiers · v0.6.0

Tokens/sec and average response time displayed per provider in the Usage Dashboard — no new backend needed, derived from existing queue completion events. Cerebras and Groq are the showcase: seeing "Cerebras ~1,800 tok/s vs. GPT-4o ~45 tok/s" live makes the fastest routing mode instantly tangible.

Per-project budget allocation

Pro · v0.6.0

Scope a monthly USD budget to a project rather than to a single provider. The cap spans all providers the project uses, so "this client gets $50/month of AI" works regardless of which provider processes each item. Pairs with cost tracking and usage export for a complete per-project cost picture.

Dynamic cost-based routing

Pro · v0.7.0

A live cost/quality scoring engine that extends the existing cheapest mode and custom routing rules. Define thresholds like "max $0.03/request" or "never use Claude for Chat prompts" and the router enforces them at dispatch time, dynamically selecting the cheapest provider that meets all active rules. The cost table updates live as provider pricing changes — few competitors do this well.

SLA enforcement engine

Pro+ · v0.7.0

Define per-project SLA rules — maximum latency, maximum cost per request, minimum reliability score. When the winning provider fails an SLA check, the router falls back to the next-best provider automatically, with no user intervention. Rules are stored locally per project. This is the "Cloudflare for AI inference" differentiator — no other desktop tool enforces SLAs across providers.

Audit log & routing history

Pro · v0.7.0

An append-only local log of every routing decision: which item went where, which routing mode fired, which rule matched, and what it cost. Stored in SQLite alongside existing usage data — no new infrastructure. Gives governance-conscious users full accountability and is the foundation for a compliance reporting tier later.

Cost allocation & chargeback export

Pro · v0.8.0

Assign a cost-center label — client, department, or team — to any project. Usage exports include the cost-center field so you can generate per-client or per-department spend reports in your own spreadsheet or BI tool. Enables chargeback billing for MSPs and freelancers without requiring a full billing engine.

Agent & MCP routing

Pro+ · v1.0

Route agentic task payloads — OpenAI Agents, LangGraph, CrewAI, MCP tool calls — through the queue the same way text prompts are routed today. The queue model already handles async workloads with retry, cost tracking, and provider fallback; this extends it to multi-step agent runs. An emerging market with very few tools doing it well.

Pricing

Simple, honest pricing.

Start free. Upgrade when you need more. No usage meters on top of your API costs.

Monthly subscription · Cancel any time · No long-term commitment
Free
For anyone who wants to run local AI or explore the queue engine. No credit card, no API key needed to start.
$0/ forever
5 local AI providers · no cloud required
Download Free
  • 5 local AI providers — Ollama, LM Studio, Jan.ai, LocalAI, llama.cpp
  • Manual & Free Tier routing modes
  • Queue up to 10 items
  • Budget spend visibility (view-only)
  • Basic usage dashboard
  • 1 project
  • Prompt type tags (Chat, Research, Code…)
  • Live token & cost estimate before queuing
  • Urgent tag — priority queue boost
  • 100% local & private — always
  • Free cloud tier (Gemini, Groq, Mistral)
  • Auto / Balance / Cheapest / Fastest routing
  • Budget caps & overage alerts
  • Mobile companion app
Pro
For power users managing multiple paid AI APIs who want all routing modes, compare mode, and full cost control.
$19 / month
Cancel any time
Get Pro
  • All 12 providers — 5 local + all 7 cloud (incl. Claude, OpenAI, DeepSeek, xAI Grok)
  • All 6 routing modes incl. Cheapest & Fastest
  • 500 items queue depth (soft cap)
  • 2,500 cloud prompt runs / month · 5M cloud tokens
  • Budget caps & overage alerts per provider
  • Unlimited projects
  • Tag-based smart priority — all 9 tag types boost queue position
  • Response comparison mode (A/B providers)
  • Prompt chaining & multi-step pipelines Roadmap
  • Image & video generation Roadmap
  • Webhook output delivery Roadmap
  • Cost forecasting & trend reports Roadmap
  • Usage Insights panel — charts, heatmap, provider breakdown Roadmap
  • Usage heatmap calendar — 90-day activity graph Roadmap
  • Scheduled-items calendar view Roadmap
  • Usage export CSV + JSON Roadmap
  • iOS & Android companion app included Roadmap
Pro+
For solo power users who need 4× the throughput, unlimited queue depth, Consensus mode, and priority support — without team overhead.
$34 / month
10,000 cloud prompts · 20M tokens / month
Join the Waitlist
  • Everything in Pro, plus…
  • Unlimited queue depth — no soft cap
  • 10,000 cloud prompt runs / month (4× Pro)
  • 20M cloud tokens / month (4× Pro)
  • Consensus mode — meta-model synthesis across Compare results Roadmap
  • Prompt habit analysis — routing efficiency suggestions from your usage data Roadmap
  • AI-powered prompt optimization — local model rewrites, 100% private Roadmap
  • Priority email support
  • All 12 providers & all 6 routing modes
  • All Pro features included
Team
For small teams that need shared settings, admin oversight, and pooled cloud usage.
$49 / user / month
25,000 cloud prompts · 60M tokens pooled
Join the Waitlist
  • Everything in Pro+, plus…
  • 25,000 cloud prompt runs / month (pooled)
  • 60M cloud tokens / month (pooled)
  • Shared team settings
  • Admin controls
  • Priority support — included
  • Unlimited queue depth & projects

Monthly cloud prompt and token limits are AIQ-side caps — they protect the service while keeping costs predictable for you. No usage surcharges are added on top of your API costs. You pay your providers directly at their published rates.

Feature comparison

Every feature, side by side.

Free Starter Pro Pro+ Soon Team Soon
Pricing
Monthly priceAll plans are monthly subscriptions — cancel any time $0 $9 / mo $19 / mo $34 / mo $49 / user / mo
AI Providers
AI providers (total)5 local + 3 free cloud + 4 paid cloud = 12 5 8 All 12 All 12 All 12
Local AI (Ollama, LM Studio, Jan.ai, LocalAI, llama.cpp)No API key · $0 per request · fully offline
Free cloud tier (Gemini, Groq, Mistral)Permanent free access — no credit card required
Paid cloud (Claude, OpenAI, DeepSeek, xAI Grok)
Monthly Cloud Limits (AIQ-side caps — not provider API limits)
Monthly cloud prompt runsDirect API calls to cloud providers via AIQ 500/mo 2,500/mo 10,000/mo 25,000/mo (pooled)
Monthly cloud tokensCumulative tokens processed through cloud providers 1M/mo 5M/mo 20M/mo 60M/mo (pooled)
Media Generation
Image generationDALL-E 3, Flux, Ideogram, Stable Diffusion (local) — Roadmap
Video generationRunway, Pika, Kling — Roadmap
Auto-save media output to folderRoadmap — ships with image generation
Queue
Max queue depth 10 items 100 items 500 (soft cap) Unlimited Unlimited
Background queue processing
Agent Gateway — local OpenAI-compatible serverPoint Hermes, n8n, OpenClaw, LangGraph, CrewAI, AutoGen at AIQ — ships v0.6.0 v0.6.0 v0.6.0 v0.6.0 v0.6.0 v0.6.0
Real-time rate limit tracking
⚡ Urgent tag — priority boostJump the queue on any plan
Tag-based smart priorityAll 9 tag types boost queue position
Batch CSV importRoadmap — Starter and above
Scheduled-items calendar viewWeek/month grid of upcoming queue items — Roadmap
Routing Modes
Manual routing
Free Tier routingPrefers local providers first, then Gemini, Groq, Mistral
Auto routingScores all providers dynamically
Balance routingRound-robins across providers
Cheapest routingLowest cost per token, real-time
Fastest routingPrefers lowest-latency providers
Custom routing rulesCost thresholds, model overrides — Roadmap
Cost & Analytics
Basic usage dashboardToken counts, request counts
Budget spend visibilitySee estimated spend per provider View-only View-only
Cost tracking per provider & model
Budget caps & overage alerts
Cost forecastingRoadmap
Usage history exportCSV (Starter) · CSV + JSON (Pro+) — Roadmap
Usage Insights panelTime-series charts, provider distribution, tag breakdown, heatmap — Roadmap
Usage heatmap calendarGitHub-style 90-day prompt volume & cost graph — Roadmap
Prompt habit analysisRouting efficiency suggestions from local usage data — Roadmap
How it compares

Why not just use
LiteLLM or OpenRouter?

Each tool solves a different slice of the problem. AIQ is the only one that combines a desktop queue, intelligent routing, cost tracking, and an agent gateway in a single local app you own.

Capability ✦ AIQ Load Manager LiteLLM OpenRouter Hermes Agent n8n (built-in)
Desktop GUI — no terminal required ✓ Native app CLI / config file Web UI only Terminal / config Browser UI
All data stays local — no cloud sync ✓ 100% local ✓ Self-hosted Cloud-hosted proxy ✓ Self-hosted Cloud or self-host
Background queue with priority ordering ✓ Full queue engine No queue UI No queue No queue Workflow-level only
Rate-limit awareness + auto wait-and-resume ✓ Per-provider RPM/TPM ✓ Router-level ✓ Provider-level Errors on 429 Fails on 429
Agent Gateway — OpenAI-compatible local endpoint ✓ v0.6.0 ✓ Proxy server Cloud endpoint only Consumer only Consumer only
Live cost tracking per provider + model ✓ Real-time dashboard Logs only Usage page None None
Pre-send cost estimate before queuing ✓ Live as you type
Routing modes (auto / cheapest / fastest / free tier) ✓ 6 modes ✓ Router strategies ✓ Provider routing Single-provider focus Manual per node
Local AI (Ollama, LM Studio, llama.cpp, etc.) ✓ 5 local providers ✓ Ollama + others Cloud only ✓ Via Ollama ✓ Via Ollama
Compare mode — same prompt to multiple providers ✓ Side-by-side
Per-project budget caps & alerts ✓ Pro ✓ Config-level ✓ Org-level
Pricing Free forever $9 / mo Starter $19 / mo Pro Free OSS · $49/mo cloud Free + usage fees Free OSS Free OSS · $20/mo cloud

LiteLLM is a great choice if you want a self-hosted proxy server with deep observability tooling. OpenRouter is ideal if you want a single cloud endpoint across hundreds of models. AIQ does both jobs locally — plus adds the queue UI, cost dashboard, and agent gateway that neither offers out of the box.

FAQ

Common questions.

Is this a chat app?

No. AIQ is a background queue engine — you fire off prompts and get on with your work while they process. It's built for people who send a lot of AI requests and need them managed, routed, and tracked, not for having a single conversation. If you want a chat interface, use Claude.ai or ChatGPT. AIQ is what sits behind your workflow.

What is the Agent Gateway?

The Agent Gateway (shipping v0.6.0) is a local HTTP server that AIQ runs on your machine at localhost:8787. It speaks the standard OpenAI Chat Completions API, so any agent framework — Hermes, n8n, CrewAI, LangGraph, AutoGen — can point its base_url at AIQ and immediately get rate-limit queuing, cost tracking, smart routing, and automatic provider fallback. One config change in your framework, nothing else to set up.

Do my prompts pass through your servers?

No. Every API call goes directly from your machine to the AI provider — Claude, OpenAI, Gemini, Groq, or whichever you've configured. AIQ is local software. We never see your prompts, your responses, or your API keys. The only data that leaves your machine is anonymous usage analytics (opt-out available in Settings), and your payment details which are handled entirely by Lemon Squeezy.

What's the difference between AIQ and OpenRouter?

OpenRouter is a cloud-hosted proxy that gives you access to hundreds of models through one endpoint — great for model breadth. AIQ runs locally, owns your queue, tracks your costs in real time, and provides a desktop UI. Starting in v0.6.0, AIQ also exposes its own OpenAI-compatible endpoint so you can use OpenRouter as a provider inside AIQ — getting both OpenRouter's model catalogue and AIQ's queue management together.

Which plan do I need for the Agent Gateway?

The Agent Gateway is available on all plans — including Free. There are no per-request fees beyond what your AI providers charge you directly. The gateway ships in v0.6.0 (August 2026).

Do I need to be a developer to use this?

Not for the core app. If you're using AIQ to queue and route prompts yourself, it's a standard desktop app — download, install, paste in your API keys, and go. The Agent Gateway does require changing one config line in your agent framework, so basic familiarity with tools like n8n or LangGraph helps there.

What happens if a provider goes down mid-run?

The queue retries the item automatically up to 3 times on transient errors (network timeouts, 5xx errors). If the provider is still unavailable, the item settles into an error state where you can retry manually with one click. Through the Agent Gateway, the router falls back to the next available provider automatically so your agent keeps running without seeing the switchover.

I already use Hermes / n8n / CrewAI. Why add AIQ?

Those frameworks handle agent logic — planning, tool use, multi-step reasoning. What they don't manage is the underlying LLM traffic: rate limits, cost tracking across providers, smart routing based on cost or speed, and automatic fallback when a provider is down. AIQ handles that layer so your agents don't have to. One config line change and every LLM call your agent makes goes through AIQ's queue.

Can I use local models with the Agent Gateway?

Yes. Local providers are first-class in AIQ and the gateway routes to them the same as any cloud provider. Pass aiq/free as the model name and AIQ prefers local providers first — fully offline, no API costs. Pass aiq/auto and the router weights local providers heavily because their cost is $0.

Is there a free tier?

Yes — free forever, no credit card required. The Free plan includes 5 local AI providers (Ollama, LM Studio, Jan.ai, LocalAI, llama.cpp), manual and free-tier routing, a 10-item queue, and the Agent Gateway when it ships. Starter ($9/mo) adds the 3 permanent free cloud tiers (Gemini, Groq, Mistral). Pro ($19/mo) unlocks all 12 providers and the full routing engine.

Ready to tame your AI APIs?

Download free and start routing prompts in under five minutes. No credit card required.

Get Started — It's Free