butthtio/solidity-cot-auditor

436

+59/day

281

PythonAI/ML

Multi-role chain-of-thought LLM pipeline for Solidity security auditing, layered on top of Slither output.

From the README

solidity-cot-auditor

Multi-role chain-of-thought LLM pipeline for Solidity security auditing

Install · Quick Start · How It Works · Configuration · Results

Static analyzers like Slither are fast and reliable, but their output is terse. A finding like reentrancy-eth tells you what fired, not why it matters in this specific contract, how an attacker would exploit it, or what the minimal fix looks like. This tool fills that gap.

solidity-cot-auditor takes Slither's JSON output and runs each finding through a four-role LLM chain:

Slither finding
    │
    ▼
[Explainer]  — technical explanation + true/false positive verdict
    │
    ▼
[ExploitWriter]  — minimal PoC sketch (for defenders)
    │
    ▼
[Fixer]  — unified diff of the minimal fix
    │
    ▼
[Judge]  — quality score + flags logical errors in the chain
    │
    ▼
Markdown + JSON report

Each role is a separate LLM call with a focused system prompt. The chain-of-thought is preserved in the output so you can inspect each step.

Install

pip install -e ".[dev]"
# slither is a separate install (requires solc)
pip install slither-analyzer

Quick Start

Audit a .sol file directly:

export OPENAI_API_KEY=sk-...
solidity-cot audit ./contracts/MyToken.sol --output reports/

Audit from a saved Slither JSON (useful in CI):

slither MyToken.sol --json slither_out.json
solidity-cot audit-json slither_out.json --project MyToken --source-root ./contracts

Try it on the included example:

solidity-cot audit examples/contracts/SimpleBank.sol --skip-judge

How It Works

Role separation

Each role has a narrow, well-defined job. This matters because:

A single "audit everything" prompt hallucinates more and produces generic output.
Separating roles lets you swap or skip stages (e.g., skip exploit writing for informational findings).
The Judge role catches when earlier roles contradict themselves or miss the point.

Contested-weighted filtering

Findings are filtered by severity before entering the chain. The default is --min-severity medium. Informational findings (pragma version, naming conventions) are skipped unless you explicitly lower the threshold.

LLM compatibility

Any OpenAI-compatible endpoint works. Point at a local vLLM server, Together AI, or Fireworks:

export LLM_BASE_URL=
export LLM_MODEL=meta-llama/Llama-3-70b-instruct
export LLM_API_KEY=dummy
solidity-cot audit MyContract.sol

Anthropic Claude is also supported directly:

export LLM_PROVIDER=anthropic
export LLM_BASE_URL=
export LLM_MODEL=claude-sonnet-4-6
export ANTHROPIC_API_KEY=sk-ant-...
solidity-cot audit MyContract.sol

Configuration

| Flag | Default | Description | |------|---------|-------------| | --min-severity | medium | Skip findings below this level | | --max-findings | 20 | Cap findings sent to the

View on GitHub