Claude Code safety: practical guardrails, logs, and rule packs for real agent runs
Claude Code safety is not a vague alignment problem. In practice, it means stopping an agent from doing risky things by default, recording what it actually did, and making the allowed behavior explicit enough that another engineer can review it. If your agent can run shell commands, edit files, or touch credentials, you need controls around the process, not just a carefully worded prompt.
A useful baseline is: wrap every Claude process, log every tool call, and enforce a small rule pack before the command executes. That is what agentcheck is built for. It acts as a behavioral governance layer for Claude Code using hook-based guardrails, JSONL audit logs, and YAML rule packs that constrain what agents can do.
What usually goes wrong
Most Claude Code safety failures are ordinary engineering failures:
- The agent runs a command with broader filesystem access than intended.
- It modifies files outside the target directory because the task description was underspecified.
- It touches production config, secrets, or deployment scripts during a local refactor.
- No one can reconstruct what happened afterward because there is no durable log.
Prompting helps, but prompts are advisory. If the runtime can still execute a bad command, you have not really solved the control problem.
Install and wrap the agent
The simplest setup is to install agentcheck, point it at a rule pack, and launch Claude Code through it.
npm install -g github:paprika-org/agentcheck
A minimal workflow looks like this:
mkdir -p .agentcheck/rules
cat > .agentcheck/rules/default.yaml <<'YAML'
rules:
- id: block-git-push
description: Prevent remote writes
match:
tool: shell
command: "(^|\\s)git\\s+push(\\s|$)"
action: deny
- id: block-env-read
description: Prevent reading local env files
match:
tool: shell
command: "(^|\\s)(cat|less|grep)\\s+.*\\.env"
action: deny
- id: restrict-paths
description: Only allow writes in src and tests
match:
tool: write
path_not_matches: "^(src|tests)/"
action: deny
YAML
agentcheck \
--rules .agentcheck/rules/default.yaml \
--log .agentcheck/logs/claude.jsonl \
-- claude
The exact Claude launch command may vary in your environment, but the pattern is the same: agentcheck sits in front, evaluates each tool invocation, and writes a durable event log.
What the logs buy you
For Claude Code safety, auditability matters almost as much as prevention. If an agent made a bad edit, you want a timeline: what tool was called, with what arguments, and whether a rule allowed or denied it.
{"ts":"2026-04-25T10:14:09Z","tool":"shell","command":"git status --short","decision":"allow","rule_id":null}
{"ts":"2026-04-25T10:14:18Z","tool":"write","path":"src/parser.ts","decision":"allow","rule_id":null}
{"ts":"2026-04-25T10:14:31Z","tool":"shell","command":"git push origin main","decision":"deny","rule_id":"block-git-push"}
Because the log is JSONL, it is easy to grep, ship to another system, or inspect during CI.
jq -c 'select(.decision=="deny")' .agentcheck/logs/claude.jsonl
jq -r '.tool' .agentcheck/logs/claude.jsonl | sort | uniq -c
This also changes team conversations. Instead of arguing about whether an agent is “generally safe,” you can review the exact denied commands and the exact write scope allowed by policy.
Start with narrow rules, then loosen them
A common mistake is trying to define a perfect policy up front. A better pattern is to begin with a restrictive rule pack and expand only when real tasks require it. For example, if you only want an agent to refactor application code and tests, block everything else first.
rules:
- id: allow-src-writes
match:
tool: write
path_matches: "^(src|tests)/"
action: allow
- id: deny-other-writes
match:
tool: write
action: deny
- id: deny-network-side-effects
match:
tool: shell
command: "(curl|wget|scp|rsync)"
action: deny
That is boring by design. Boring policies are easier to reason about than “smart” policies with too many exceptions.
Where this helps most
This approach is useful when Claude Code is running in one of these modes:
- Inside a real repository with deployment scripts, infrastructure code, or secrets nearby.
- In CI or a shared runner where accidental shell access has a larger blast radius.
- As part of a repeated workflow where you want the same controls every run.
- On teams that need a reviewable record of agent behavior.
Honest limitations
There are a few practical limits worth stating clearly.
- Regex-based command matching is only as good as the patterns you write. If your rules are sloppy, bypasses become more likely.
- Logging tool calls does not automatically capture intent. You still need human review for high-risk workflows.
- Path restrictions help a lot, but shell commands can have indirect effects that are harder to model than simple file writes.
- Agent wrappers are strongest when paired with OS-level isolation such as containers, restricted tokens, or ephemeral workspaces.
So Claude Code safety should be treated as layered defense: runtime guardrails, audit logs, limited credentials, and an execution environment with a small blast radius.
A practical team baseline
If you need a starting point for a small engineering team, this is a reasonable baseline:
1. Run every Claude Code session through agentcheck.
2. Log all tool calls to a per-run JSONL file.
3. Deny network writes and remote git operations by default.
4. Restrict writes to a narrow set of project directories.
5. Review denied events weekly and refine the rule pack.
6. Use separate credentials or no credentials for agent sessions.
That setup will not solve every edge case, but it will catch a large class of expensive mistakes and make the remaining ones easier to investigate.
Want a concrete way to improve Claude Code safety? Start by wrapping your next agent run with agentcheck, add a small YAML rule pack, and inspect the JSONL output after a real task. The project is on GitHub: github.com/paprika-org/agentcheck.