Claude Code Token Usage: How to See Tokens During a Session

If you are trying to understand Claude Code token usage, the frustrating part is simple: Claude Code does not really show you what a session is costing while it is happening. You feel it later when the bill lands. That is fine for small experiments, but it is not fine when you are debugging agentic loops, running long sessions, or trying to stay inside a hard budget.

That gap is why I built agent-bill-guard. It is a tiny local Python proxy, stdlib only, that sits between Claude Code and the Anthropic API. Instead of waiting until after the fact, you get a running token counter and estimated cost in your terminal as requests happen. It can also enforce a hard cap, which is useful when you want a session to stop before it drifts into runaway usage.

Why tokens per session matters

Per-session token visibility matters for a few very practical reasons. First, it helps when debugging long agentic loops. If a coding session keeps spiraling, you want to know whether it is burning a few thousand tokens or a few hundred thousand. Second, it lets you compare approaches. A prompt strategy that feels slightly better is not automatically better if it costs 3x more tokens. Third, a lot of people are working inside free-tier limits, client budgets, or internal project caps. In those cases, “we will find out later” is not an acceptable monitoring strategy.

What you usually want is a simple live readout: how many tokens this session has consumed so far, what that roughly means in dollars, and whether you are close to a limit you actually care about.

[agent-bill-guard] tokens: 23,841 / 100,000 | est. $0.71
[agent-bill-guard] tokens: 31,422 / 100,000 | est. $0.94
[agent-bill-guard] tokens: 48,905 / 100,000 | est. $1.47
[agent-bill-guard] tokens: 99,188 / 100,000 | est. $2.98
[agent-bill-guard] cap reached, blocking further requests

Why Claude Code hooks are not enough

A reasonable first thought is: can this be done with Claude Code hooks? Not really, at least not for actual token accounting. Hooks like PreToolUse and PostToolUse tell you when tool calls happen. That can be useful for logging workflow events, but it does not surface token counts from the model response. Tool hooks are about tool execution, not the raw API usage payload.

So if your goal is true Claude Code token usage tracking, hooks alone do not solve it. They do not give you the thing you actually need: the usage.input_tokens and usage.output_tokens values returned by the Anthropic API.

Why the proxy approach works

The proxy approach is more blunt, but it works because it sits on the actual request path. Claude Code sends requests to the proxy, the proxy forwards them to Anthropic, and then the proxy reads the raw response JSON before passing it back. That means it can inspect the usage fields directly and keep a running total for the current session.

In practice, that gives you something close to what Claude Code should arguably expose natively, but currently does not: live token accounting and an optional hard stop. It is not a built-in solution. It is a local proxy. That means it is one extra layer in the stack, and it is only as accurate as the API usage data it sees. But for day-to-day monitoring, that is exactly the point.

The core logic is straightforward. After receiving the API response, the proxy reads the usage object and adds input and output tokens to the session total:

data = json.loads(response_body.decode("utf-8"))

usage = data.get("usage", {})
input_tokens = int(usage.get("input_tokens", 0))
output_tokens = int(usage.get("output_tokens", 0))

request_tokens = input_tokens + output_tokens
session_tokens += request_tokens

print(
    f"[agent-bill-guard] tokens: {session_tokens:,} / {cap:,} | est. ${estimated_cost:.2f}"
)

That is the whole idea. No SDK magic, no external dependency chain, no guessing based on prompts or tool activity. Just read the usage numbers from the Anthropic response and track them live.

A practical fix for a practical problem

If you only use Claude Code occasionally, maybe delayed billing visibility is tolerable. If you use it heavily, it becomes a blind spot. That is especially true when you are iterating on prompts, testing agent behavior, or handing sessions to teammates who need guardrails. In those cases, seeing token usage during the session is materially better than seeing it later in an invoice.

That is what agent-bill-guard is for. It does one narrow job: expose live token and cost information for Claude Code sessions, and optionally stop usage at a hard cap. It is not pretending to be more than that.

To try it locally, run git clone https://github.com/paprika-org/agent-bill-guard && cd agent-bill-guard && python proxy.py --cap 100000. The project is here: agent-bill-guard on GitHub.