butthtio/solidity-cot-auditor
butthtio/solidity-cot-auditorMulti-role chain-of-thought LLM pipeline for Solidity security auditing, layered on top of Slither output.
From the README
solidity-cot-auditor
Multi-role chain-of-thought LLM pipeline for Solidity security auditing
Install · Quick Start · How It Works · Configuration · Results
Static analyzers like Slither are fast and reliable, but their output is terse. A finding like reentrancy-eth tells you what fired, not why it matters in this specific contract, how an attacker would exploit it, or what the minimal fix looks like. This tool fills that gap.
solidity-cot-auditor takes Slither's JSON output and runs each finding through a four-role LLM chain:
Slither finding
│
▼
[Explainer] — technical explanation + true/false positive verdict
│
▼
[ExploitWriter] — minimal PoC sketch (for defenders)
│
▼
[Fixer] — unified diff of the minimal fix
│
▼
[Judge] — quality score + flags logical errors in the chain
│
▼
Markdown + JSON report
Each role is a separate LLM call with a focused system prompt. The chain-of-thought is preserved in the output so you can inspect each step.
Install
pip install -e ".[dev]"
# slither is a separate install (requires solc)
pip install slither-analyzer
Quick Start
Audit a .sol file directly:
export OPENAI_API_KEY=sk-...
solidity-cot audit ./contracts/MyToken.sol --output reports/
Audit from a saved Slither JSON (useful in CI):
slither MyToken.sol --json slither_out.json
solidity-cot audit-json slither_out.json --project MyToken --source-root ./contracts
Try it on the included example:
solidity-cot audit examples/contracts/SimpleBank.sol --skip-judge
How It Works
Role separation
Each role has a narrow, well-defined job. This matters because:
- A single "audit everything" prompt hallucinates more and produces generic output.
- Separating roles lets you swap or skip stages (e.g., skip exploit writing for informational findings).
- The Judge role catches when earlier roles contradict themselves or miss the point.
Contested-weighted filtering
Findings are filtered by severity before entering the chain. The default is --min-severity medium. Informational findings (pragma version, naming conventions) are skipped unless you explicitly lower the threshold.
LLM compatibility
Any OpenAI-compatible endpoint works. Point at a local vLLM server, Together AI, or Fireworks:
export LLM_BASE_URL=
export LLM_MODEL=meta-llama/Llama-3-70b-instruct
export LLM_API_KEY=dummy
solidity-cot audit MyContract.sol
Anthropic Claude is also supported directly:
export LLM_PROVIDER=anthropic
export LLM_BASE_URL=
export LLM_MODEL=claude-sonnet-4-6
export ANTHROPIC_API_KEY=sk-ant-...
solidity-cot audit MyContract.sol
Configuration
| Flag | Default | Description |
|------|---------|-------------|
| --min-severity | medium | Skip findings below this level |
| --max-findings | 20 | Cap findings sent to the