Developer Guide

Claude code tool use monitoring: practical logging and guardrails for agent runs

Published for engineers running Claude Code in local or CI workflows

Claude code tool use monitoring matters once an agent can run shell commands, read files, or invoke other tools on your machine. If you need an audit trail, or you want to stop certain actions before they happen, you need more than transcripts. You need structured logs and an enforcement layer around each Claude process.

That is the problem agentcheck is built to solve. It wraps Claude Code with hook-based guardrails, writes tool activity to JSONL, and applies YAML rule packs that constrain what an agent can do. This is useful when you want repeatable visibility into tool calls, especially in team environments, CI jobs, or any workflow where "the agent probably did X" is not good enough.

What you actually need to monitor

For most developer workflows, good monitoring is not about recording everything forever. It is about answering a few concrete questions quickly:

Claude code tool use monitoring is most useful when logs are machine-readable. JSONL works well because each event is a standalone line, which means you can inspect it with jq, stream it, or ship it into a log pipeline without special parsing.

Install and run agentcheck

The install path is intentionally simple:

npm install -g github:paprika-org/agentcheck

Once installed, run Claude Code through the wrapper instead of launching it directly. The exact invocation depends on how you use Claude Code locally, but the pattern is the same: start the agent process under agentcheck so hooks can observe tool calls and apply policies.

# Example pattern: run Claude Code through agentcheck
agentcheck claude

# Or pass through normal arguments
agentcheck claude --prompt "inspect the repo and summarize failing tests"

If your environment writes logs to a dedicated directory, keep that directory outside the repo and make rotation explicit. Agent traces can include file paths, command arguments, and snippets of content, so treat them as operational data, not harmless debug output.

Example: inspect tool activity with JSONL

A typical JSONL event stream for tool monitoring should be boring and parseable. That is a good thing. Here is the kind of output you want to be able to query:

{"ts":"2026-04-23T12:00:01Z","tool":"shell","args":{"cmd":"ls -la"},"decision":"allow"}
{"ts":"2026-04-23T12:00:03Z","tool":"read_file","args":{"path":"src/index.ts"},"decision":"allow"}
{"ts":"2026-04-23T12:00:05Z","tool":"shell","args":{"cmd":"rm -rf /tmp/build"},"decision":"deny","rule":"block-destructive-shell"}

Once you have that, normal Unix tooling becomes enough for quick analysis:

# Count tool usage by tool name
jq -r '.tool' agentcheck.log.jsonl | sort | uniq -c | sort -nr

# Show denied actions
jq 'select(.decision == "deny")' agentcheck.log.jsonl

# Review shell commands only
jq 'select(.tool == "shell") | {ts, cmd: .args.cmd, decision, rule}' agentcheck.log.jsonl

This is where the monitoring becomes practical. You are no longer reading a chat transcript trying to infer whether the agent used grep, touched a secret file, or retried a blocked command five times.

Example: constrain behavior with YAML rule packs

Logging alone is passive. If you already know what must not happen, add rules and fail fast. A rule pack can block high-risk commands, prevent writes outside a working directory, or restrict tools entirely for certain jobs.

rules:
  - id: block-destructive-shell
    when:
      tool: shell
      args:
        cmd_regex: '(^|\\s)(rm\\s+-rf|mkfs|dd\\s+if=)(\\s|$)'
    action: deny

  - id: block-home-directory-writes
    when:
      tool: write_file
      args:
        path_regex: '^/home/'
    action: deny

  - id: allow-readonly-repo-analysis
    when:
      tool: shell
      args:
        cmd_regex: '^(ls|find|rg|cat|git\\s+status|git\\s+diff)'
    action: allow

That setup will not solve every policy problem, but it covers the common ones: destructive shell calls, writes to the wrong place, and enforcing a read-only analysis mode. For CI, this is often enough to turn an agent from "interesting but risky" into something you can actually run in automation.

What this solves well, and what it does not

There are real limits here, and they matter.

The honest framing is this: agentcheck gives you structured visibility and enforceable constraints around Claude Code, but it is not a full sandbox. If you need isolation from the host system, use OS-level controls as well. Behavioral governance and system isolation solve different parts of the problem.

Recommended baseline for teams

If you are introducing Claude code tool use monitoring for a team, a reasonable baseline looks like this:

That gives you something operationally useful without overengineering the first pass.

Practical takeaway: if your current answer to "what did the agent actually do?" is "read the transcript," you do not have reliable tool monitoring yet.

Call to action

If you want a straightforward starting point for Claude code tool use monitoring, install agentcheck, run Claude Code through it, and inspect the JSONL output before writing any complex rules. Then add a small YAML rule pack that blocks destructive shell commands and out-of-bounds writes. That sequence gives you immediate visibility and a sane first layer of control.

Project repo: github.com/paprika-org/agentcheck

Install with npm install -g github:paprika-org/agentcheck and use the repository examples as the starting point for your own rule packs.