How can we help?

Find answers to common questions about using TokenLens to reduce your AI API costs.

🔍
🚀

Getting Started

Upload your first log file and understand your results

📊

Understanding Results

What the numbers mean and how to act on them

💰

Saving Money

Turn waste into savings with specific actions

🖥️

Using the Dashboard

Navigate every screen and feature

💳

Plans & Billing

Compare plans, upgrade, and manage your account

🔒

Security & Privacy

How we protect your data

🚀 Getting Started

How do I upload my first log file? +

Getting your first waste analysis takes less than a minute:

  1. Go to tokenlens.co/app and click Upload in the sidebar
  2. Drag and drop your log file onto the upload area, or click to browse
  3. Wait a few seconds — TokenLens auto-detects your format
  4. Your results appear automatically on the Overview screen
💡
No signup required. You can upload and see results immediately. Create an account only if you want to save your scan history.
What file formats are supported? +

TokenLens accepts three file types:

  • JSON — Single JSON object or array of objects (including nested wrappers)
  • JSONL — One JSON object per line (most common for API logs)
  • CSV / TSV — Comma or tab-separated values with a header row

Within these files, we automatically recognize logs from:

  • OpenAI — Raw API responses, chat completions, usage exports
  • Anthropic — Messages API responses, usage data
  • Google / Vertex AI — Gemini API responses, Cloud Logging exports
  • Grok / xAI — OpenAI-compatible API format, console exports
  • GitHub Copilot — Admin usage CSV exports, VS Code extension data
  • Amazon Q Developer — CloudWatch metrics, CloudTrail events, admin reports
  • Cursor / Codeium / Windsurf / Tabnine — Admin dashboard exports
  • LiteLLM — Proxy spend logs (JSON)
  • Helicone — Request exports (JSON/CSV)
  • Langfuse — Trace exports (JSON)
  • Datadog — Trace logs from AI service monitoring
  • Splunk — Search job results from AI log queries
  • Gateway/proxy wrappers — Nested request/response structures
  • TokenLens VS Code Extension — Session export JSON files
  • Custom formats — Any log with prompt text, model name, and token count
💡
Not sure about your format? Just upload it. TokenLens will tell you if it can't read the file, and show you what fields it found.
Where do I get my API log files? +

Here's how to get logs from common sources:

  • OpenAI: Go to platform.openai.com → Usage → Export CSV. Or export API response JSON/JSONL from your app logs.
  • Anthropic: Export usage data from console.anthropic.com → Usage → Export, or use your app's response logs.
  • Google/Vertex AI: Cloud Console → Vertex AI → Usage Metrics → Export, or Cloud Logging filtered to aiplatform.
  • Grok/xAI: console.x.ai → Usage → Export. Uses OpenAI-compatible format, so app logs work directly.
  • GitHub Copilot: GitHub.com → Organization → Settings → Copilot → Usage → Export CSV. Or install the TokenLens VS Code extension for per-developer data.
  • Amazon Q Developer: AWS Console → CloudWatch → CodeWhisperer metrics → Export CSV, or CloudTrail → Event History → filter codewhisperer.
  • Cursor: cursor.com/settings → Team → Usage → Export. For detailed logs, route through a LiteLLM proxy.
  • LiteLLM (best for full data): curl http://localhost:4000/spend/logs > logs.json. Captures everything from any tool.
  • Helicone: helicone.ai → Requests → Filter → Export JSON/CSV.
  • Langfuse: cloud.langfuse.com → Traces → Export JSON.
  • Datadog/Splunk: Set up connectors in the dashboard (Growth plan) for automatic polling, or export AI API traces as JSON manually.
  • Your application: Most teams already log API calls for debugging. Look for files containing prompt text and token counts.
💡
Start small. Even a day or week of logs is enough to find patterns. You don't need months of data for your first scan.
Do I need to create an account? +

No. You can upload and analyze up to 3 files without any account. Your results stay in your browser session.

Creating a free account gives you:

  • 10 uploads per month (vs 3 as guest)
  • Scan history — come back and see past results
  • Feedback — mark clusters to improve future detection
  • Alerts — get notified about new waste patterns
  • Progress tracking — see if your waste is going down over time

📊 Understanding Your Results

What do the numbers on the Overview screen mean? +

After a scan, you'll see four key numbers:

  • Total Spend — The combined cost of every API call in your log file, calculated from token counts and current model pricing
  • Waste Detected — The dollar amount spent on calls that were duplicates or unnecessary. This is money you could save.
  • Waste Rate — The percentage of your total spend that's waste. If you see 80%, that means 80 cents of every dollar is going to duplicate calls.
  • Clusters Found — The number of distinct duplicate patterns we identified. Each cluster is a group of similar calls that could be combined or cached.
⚠️
Don't panic if your waste rate is high. 50-85% is typical. Most teams have no idea they're sending the same prompts hundreds of times. That's exactly what TokenLens is for.
What is a "cluster"? +

A cluster is a group of API calls that are duplicates or very similar to each other. Think of it like finding 500 copies of the same email in your outbox — you only needed to send it once.

There are four types of clusters:

  • Exact duplicates — Identical prompts sent multiple times. The easiest waste to fix (add a cache).
  • Near-exact — Same prompt with minor differences like extra spaces or punctuation. Same fix: normalize and cache.
  • Template patterns — Same prompt template with different variables, like "Summarize {document}" sent 500 times with different documents. Consider batch processing.
  • Semantic duplicates — Different words but the same meaning. "What's the weather?" and "Tell me today's forecast" — consolidate to one canonical prompt.
What are the 5 detection layers? +

TokenLens uses five increasingly sophisticated methods to find waste. Each layer catches duplicates that the previous one missed:

  • Layer 1 — Exact Match — Finds byte-for-byte identical prompts. 100% accurate, never wrong. Free
  • Layer 2 — Near Match — Catches the same prompt with tiny differences like extra whitespace or punctuation. Free
  • Layer 3 — Word Overlap — Finds prompts with the same words in a slightly different order. Free
  • Layer 4 — Template Detection — Recognizes prompts that share a template (like "Summarize {X}") but fill in different variables. Starter
  • Layer 5 — Meaning Match — Catches prompts that say the same thing in completely different words. Starter

Free accounts get the first 3 layers, which typically catch 60-70% of waste. Starter and above get all 5 layers for maximum coverage.

Is my waste rate normal? +

Here's what we typically see:

  • 30-50% waste — Below average. Your team is already somewhat thoughtful about API usage. There's still room to optimize.
  • 50-70% waste — Average. Most teams fall here. There are clear caching and deduplication opportunities.
  • 70-90% waste — Common for fast-growing teams or apps with microservices. Multiple services often call the same prompts independently.
  • 90%+ waste — Usually indicates a retry loop, a batch process resending identical calls, or a testing script left running.
What do the model recommendations mean? +

When TokenLens sees you using an expensive model (like GPT-4o) for simple tasks, it suggests a cheaper alternative. Each recommendation shows:

  • Current model — What you're using now and what it costs
  • Suggested model — A cheaper option and its cost
  • Monthly savings — How much you'd save by switching
  • Risk level — Green (safe), yellow (test first), or red (quality may drop)

For example, using GPT-4o to classify support tickets? GPT-4o-mini does it just as well for 90% less.

How do I give feedback on results? +

On the Clusters screen, each cluster has feedback buttons:

  • ✓ Confirmed waste — "Yes, these are real duplicates." This helps TokenLens learn what waste looks like in your codebase.
  • ✗ Not waste — "These aren't actually duplicates." This tunes thresholds so you see fewer false positives.
  • ~ Partial — "Some are waste, some aren't." This is the most nuanced signal and helps fine-tune detection.

Your feedback improves detection accuracy over time. The more you rate, the better TokenLens gets at finding your specific waste patterns.

💰 Saving Money

How do I actually reduce my AI costs? +

TokenLens shows you the waste — here's how to fix the most common patterns:

  • Exact duplicates → Add a caching layer. Use the prompt text (or a hash) as the cache key. Even a 1-hour TTL can eliminate most duplicates.
  • Near-exact duplicates → Normalize your prompts before sending. Strip extra whitespace, lowercase where possible, use a consistent format.
  • Template patterns → If you're sending "Summarize {doc}" 500 times, consider batch processing or a smarter caching strategy.
  • Wrong model → Switch classification and extraction tasks from GPT-4o to GPT-4o-mini. For most routine tasks, the cheaper model works just as well.
💡
Start with the biggest cluster. Sort clusters by "wasted cost" and fix the top 3. That alone often recovers 50-70% of the total waste.
What is Caching Analysis? Starter +

The Caching Analysis feature (available on Starter plans and above) looks at your duplicate patterns and tells you specifically:

  • Which prompts would benefit most from caching
  • What cache strategy to use (exact, semantic, or template-based)
  • Estimated cache hit rate for each pattern
  • Projected monthly savings if you implement the cache

It's like having a senior engineer review your API calls and tell you exactly where to add caching.

What is Anomaly Detection? Growth +

Anomaly Detection (available on Growth plans) watches for unusual spending patterns:

  • Cost spikes — A sudden jump in spending compared to your normal baseline
  • Retry storms — Hundreds of identical calls in a short period (usually a runaway automated process)
  • Model misuse — A team suddenly using expensive models for tasks that don't need them

Each anomaly gets a severity level (critical, warning, info) so you know what to investigate first.

What is Prompt Scoring? Growth +

Prompt Scoring grades your prompts on a 0-100 scale across four dimensions:

  • Efficiency — Are you using too many tokens for the task? Verbose prompts cost more.
  • Specificity — Vague prompts produce longer, more expensive responses. Specific prompts save tokens.
  • Cacheability — Could this prompt be cached? Prompts that don't change are prime cache candidates.
  • Cost efficiency — Are you using the right model? A simple classification doesn't need GPT-4o.

Each dimension includes specific suggestions for improvement with estimated savings.

How do I show my boss the ROI? Growth +

The ROI Report (Growth plan) gives you a ready-to-share summary showing:

  • Total waste found in dollars
  • Breakdown by category (duplicates, model waste, prompt inefficiency)
  • Net savings after subtracting the TokenLens subscription cost
  • Payback period — how many days until the subscription pays for itself
  • Projected annual savings

You can download it as a PDF directly from the dashboard. Most teams see 3-10× ROI — meaning for every $1 spent on TokenLens, you save $3-10 in AI costs.

What is Token Efficiency Deep Dive? Growth +

Token Efficiency (Growth plan) analyzes every call pattern at the token level to find hidden waste:

  • System prompt bloat — Detects large system prompts (800+ tokens) sent identically on every API call. If you're sending the same 2,000-token system prompt 500 times, that's nearly 1M wasted tokens.
  • Verbose output detection — Finds calls where the model's response is 3× longer than your input. Often means the model is over-generating and you could add a max_tokens limit or ask for concise output.
  • Micro-call overhead — Catches high-frequency calls with tiny inputs (under 50 tokens). These could often be batched into a single request.
  • Input/output ratio analysis — Shows the I/O balance for each call pattern so you can spot inefficiencies at a glance.

Each issue includes estimated saveable tokens and a specific recommendation (use prompt caching, add max_tokens, batch requests, etc.).

What is Cost Forecasting? Growth +

Cost Forecasting (Growth plan) projects where your AI spending is headed based on your upload history:

  • 7, 30, and 90-day projections — Linear trend analysis across all your scans to predict upcoming spend
  • Daily burn rate — Your average spend per day, so you can set realistic budgets
  • Growth rate tracking — Are costs accelerating, stable, or declining? Catches runaway growth before the invoice arrives.
  • Confidence scoring — Based on how many data points you have (high with 14+ scans, medium with 5+, low with fewer)

The forecast chart shows your actual spend history as a solid line with projected future spend as a dashed extension. Unlike other analysis pages, Cost Forecast uses data from all your scans combined (not a single scan), since it needs the full history to project trends.

💡
Upload weekly. The more data points you have, the more accurate your forecast becomes. With 2+ weeks of regular uploads, you'll get high-confidence projections.
What is the Multi-Model Price Simulator? Growth +

The Model Simulator (Growth plan) answers the question: "What would my workload cost on a different model?"

It takes your actual token volumes from a scan and calculates the exact cost on 12+ models across three providers:

  • OpenAI — GPT-4o, GPT-4-Turbo, GPT-4.1, GPT-4o-mini, GPT-4.1-mini, GPT-4.1-nano
  • Anthropic — Claude Opus, Claude Sonnet, Claude Haiku
  • Google — Gemini 2.5 Pro, Gemini 2.0 Flash, Gemini 2.0 Flash Lite

The comparison table shows each model's simulated cost, savings vs. your current spend, and a monthly projection. Models are sorted cheapest-first, with the best option highlighted in green.

💡
Different tasks, different models. You don't have to move everything to one model. Use the simulator to find the cheapest option for each call pattern, then mix and match.

🖥️ Using the Dashboard

What are the different dashboard screens? +

The dashboard sidebar shows all available screens. Here's what each one does:

  • Upload — Drop your log files here. Includes step-by-step export guides for 14 tools: Copilot, Amazon Q, Cursor, Grok, OpenAI, Anthropic, Google, Codeium, Tabnine, LiteLLM, Helicone, Langfuse, Datadog, and Splunk.
  • Overview — Your scan summary with key metrics: total spend, waste, savings rate, and charts.
  • Clusters — Detailed list of every waste pattern found. Search, sort, expand for details, and give feedback.
  • Spend — See where your money goes — broken down by model and team with visual charts.
  • History — All your past scans. Click any scan to reload its results. Track your progress over time.
  • Connectors — Set up Datadog or Splunk connections for automatic log polling. (Growth+)
  • Alerts — Create notification rules so you know when waste patterns appear.
  • Insights — Your learning system: detection accuracy stats, adaptive threshold tuning, anonymized waste patterns.
  • Caching — Identifies which API calls can be cached and projects the savings. (Starter+)
  • Model Optimizer — Recommends cheaper models for each call pattern with risk scores. (Starter+)
  • Anomalies — Detects cost spikes, retry storms, and concentration risks. (Growth+)
  • Prompt Scoring — Grades each prompt 0-100 across efficiency, specificity, and cacheability. (Growth+)
  • ROI Report — Boardroom-ready savings breakdown with payback period. (Growth+)
  • Token Efficiency — Deep token-level analysis: system prompt bloat, verbose output, I/O ratios. (Growth+)
  • Cost Forecast — 7/30/90-day spend projections with growth rate and burn rate tracking. (Growth+)
  • Model Simulator — Simulates your workload cost on 15+ models across OpenAI, Anthropic, Google, Grok, and Copilot. (Growth+)
  • Account — Your profile, security settings (MFA), and plan details.

Some screens have a 🔒 icon — these are available on paid plans. Click them to see what you'd unlock.

💡
Scan selector: All analysis pages (Caching, Model Optimizer, Anomalies, Prompt Scoring, ROI, Token Efficiency, and Model Simulator) have a dropdown at the top to pick which historical scan to analyze — not just the most recent upload.
How do I view past scans? +

The History screen shows all your previous scans with timestamps and key metrics. Click any scan to reload its full results into the dashboard.

You can also use the scan selector dropdown at the top of every analysis page (Caching, Model Optimizer, Anomalies, Prompt Scoring, ROI, Token Efficiency, and Model Simulator) to pick which historical scan to analyze — without having to go back to History first.

The one exception is Cost Forecast, which always uses data from all your scans combined (since it needs the full history to project trends).

💡
Track your progress: Upload logs weekly and watch your waste rate drop as you implement fixes. The Cost Forecast page shows your improvement trend over time.
Why are some features locked? +

Features with a 🔒 icon require a paid plan. When you click a locked feature, you'll see:

  • What the feature does
  • Which plan unlocks it
  • Key benefits of upgrading

You can also click "Compare Plans" from your Account page to see a full side-by-side comparison of what each plan includes.

How do I export or download reports? +

Export options depend on your plan:

  • Free: CSV export of cluster data
  • Starter: CSV + JSON + branded PDF reports
  • Growth: All formats + ROI report PDF

Look for the Export or Download buttons on the Overview and Clusters screens.

How do I switch between dark and light mode? +

Click the ☀️/🌙 icon in the top-right corner of any page. Your preference is saved and persists across visits.

💳 Plans & Billing

What are the different plans? +

TokenLens has four plans designed for different team sizes and AI spending levels:

  • Free ($0/forever) — Perfect for trying TokenLens. 3 guest analyses without signup, or 10 uploads/month with a free account. 5,000 records per file, 3 detection layers, basic Copilot/Amazon Q analysis. Great for a first look at your waste.
  • Starter ($99/month) — For developers and small teams spending $500+/mo on AI. 25 uploads/month, 50K records, all 5 detection layers, 5 team members with shared scans, caching analysis, model suggestions, Copilot/Cursor deep analysis with per-developer breakdowns, waste trend charts, adaptive thresholds, pattern library, and PDF reports. Typical ROI: 3-5×.
  • Growth ($299/month) — For teams managing AI spend at scale. Unlimited uploads, 500K records, 20 seats with team management, VS Code extension with cloud sync, anomaly detection, prompt scoring, token efficiency deep dive, cost forecasting, multi-model price simulator, ROI reports, cross-tool comparison, Datadog/Splunk connectors, Slack/webhook/budget alerts, and shared scans. Typical ROI: 10×.
  • Enterprise (custom) — For large organizations needing unlimited seats with RBAC, SSO (SAML/OIDC), activity audit logs, self-hosted deployment, custom integrations, dedicated CSM, SLA guarantee, and SOC 2 & HIPAA readiness. Contact sales.
How do I upgrade my plan? +

There are several ways to upgrade:

  • Click any locked feature — the upgrade prompt has a direct button
  • Go to Account in the sidebar and click Compare Plans
  • Click the upgrade banner in the sidebar (visible on Free and Starter plans)

You'll be taken to a secure Stripe checkout page. Your new features activate immediately after payment.

What are the upload limits? +
  • Free: 10 uploads per month, up to 5,000 records per file
  • Starter: 25 uploads per month, up to 50,000 records per file
  • Growth: Unlimited uploads, up to 500,000 records per file
  • Enterprise: Unlimited everything

Upload counts reset at midnight UTC. The counter on the Upload screen shows how many you have left today.

Can I cancel anytime? +

Yes. All paid plans are month-to-month with no long-term commitment. Cancel anytime from Account → Manage Subscription, or through the Stripe customer portal.

When you cancel, you keep access to paid features until the end of your current billing period. After that, your account reverts to the Free plan. Your scan history is preserved.

Do you offer annual pricing? +

Yes! Annual plans save you 2 months (about 17% off):

  • Starter: $990/year (saves $198 vs monthly)
  • Growth: $2,990/year (saves $598 vs monthly)

Annual pricing is available during checkout.

How many team members can use TokenLens? +

TokenLens supports team collaboration on paid plans:

  • Free: No team features (single user)
  • Starter: Up to 5 members with team management, shared scans, and admin/member roles
  • Growth: Up to 20 members with full RBAC (admin, member, viewer roles), shared scans, activity log, and ownership transfer
  • Enterprise: Unlimited seats with RBAC, SSO (SAML/OIDC), activity audit logs, and dedicated CSM

To create a team: go to the Account page and click Create Team. You become the team owner and admin automatically.

What are the different team roles? +

Each team member has one of three roles:

  • Admin — Full access. Can upload logs, view and delete scans, invite/remove members, change roles, manage connectors, edit team settings, and view the activity log. The team owner is always an admin and cannot be removed.
  • Member — Can upload logs, view all team scans, export reports, create alerts for themselves, and give feedback on clusters. Cannot delete scans, remove other members, invite new members, or manage connectors.
  • Viewer (Growth+ only) — Read-only access. Can view team scans, reports, and the activity log. Cannot upload, delete, create alerts, or modify anything.
💡
Starter plans have admin and member roles. The viewer role is available on Growth plans and above.
How do I invite team members? +

Only team admins can invite new members:

  • Go to Account → Team in the dashboard
  • Click Invite Member and enter their email address
  • Choose their role (admin, member, or viewer)
  • They receive an invitation link valid for 7 days
  • They click the link, log in (or create a free account), and join your team

You can revoke pending invitations at any time. Members can leave voluntarily, but only admins can remove other members. The team owner cannot be removed — they must transfer ownership or delete the team.

🔒 Security & Privacy

Is my data safe? +

Yes. Here's how we handle your data:

  • Read-only analysis. We never touch your production APIs. We only read the log files you upload.
  • No API key storage. TokenLens never asks for or stores your OpenAI/Anthropic API keys.
  • Time-limited retention. Free tier data is deleted after 7 days. Paid tiers have longer retention (90 days to 1 year) but you can delete anytime.
  • Encryption. All data is encrypted in transit (TLS) and at rest on paid plans.
  • Self-hosting option. Enterprise customers can run TokenLens entirely on their own infrastructure — your data never leaves your network.
How do I enable two-factor authentication (MFA)? +
  1. Go to Account in the dashboard sidebar
  2. Scroll to the Security section
  3. Click Enable MFA
  4. Scan the QR code with your authenticator app (Google Authenticator, Authy, 1Password, etc.)
  5. Enter the 6-digit code to confirm

Once enabled, you'll need both your password and a code from your authenticator app to log in.

How do I delete my data? +

You have full control over your data:

  • Delete individual scans: Go to History, click the ✕ button on any scan
  • Clear all history: Go to History, click "Clear All"
  • Delete your account: Contact us at support@tokenlens.co and we'll permanently delete your account and all associated data

Still have questions?

We're happy to help. Reach out and we'll get back to you within 24 hours.

Contact Support →