SPECA: Specification-Anchored Agentic Auditing for Security Verification
NyxFoundation/specaSPECA: Specification-to-Checklist Agentic Auditing Framework
AI Analysis
An agentic framework that audits code against natural-language specifications rather than patterns.
Built for Security engineers and protocol developers auditing complex, specification-governed systems.
From the README
SPECA: A Specification-to-Checklist Agentic Auditing Framework
Paper: Masato Kamba, Hirotake Murakami, Akiyoshi Sannai. Beyond Code Reasoning: A Specification-Anchored Audit Framework for Expert-Augmented Security Verification. arXiv preprint arXiv:2604.26495, 2026.
Abstract
Security-critical software is routinely audited by tools that reason about vulnerabilities as repository-local code patterns. Yet specification-governed systems — protocol stacks, consensus implementations, cryptographic libraries — are constrained by invariants and correctness conditions defined in natural-language specifications. When a vulnerability arises from what the specification requires rather than how code is written, code-level approaches lack the representational vocabulary to detect it, and their false positives resist systematic diagnosis.
SPECA is a specification-anchored security audit framework that derives explicit, typed security properties from natural-language specifications and audits implementations through structured proof-attempt reasoning grounded in each property. The framework yields three capabilities absent from code-driven auditing:
- Spec-dependent detections that no code-local pattern matcher can express.
- Controlled cross-implementation comparison under a shared property vocabulary.
- False positives that decompose into interpretable, pipeline-phase-traceable root causes.
Headline Results
- Sherlock Ethereum Fusaka Audit Contest (366 submissions, 10 implementations): SPECA recovers all 15 in-scope H/M/L vulnerabilities (5H/2M/8L) and independently discovers 4 bugs confirmed by developer fix commits — including a cryptographic invariant violation absent from all 366 adjudicated contest submissions.
- RepoAudit C/C++ benchmark (15 projects, 35 non-disputed ground-truth bugs): SPECA matches the best published precision (88.9%, Sonnet 4.5) while surfacing 12 author-validated candidate bugs beyond the established ground truth — two confirmed by upstream maintainers.
- All false positives in the deep analysis (N=16) trace to three interpretable root causes — trust boundary misunderstanding (50%), code reading error (37.5%), specification misinterpretation (12.5%) — each mapped to a specific pipeline phase.
See Evaluation for full numbers and charts.
Table of Contents
- Why SPECA?
- Quick Start
- Demo
- Architecture
- Phases
- Running on GitHub Actions
- Configuration
- Evaluation — RQ1 Sherlock + RQ2 RepoAudit, with charts
- Reproducing the Benchmarks
- Contributing
- Citation
- License
Why SPECA?
Existing LLM-based auditors begin from the code and work outward — scanning a repository for bug-pattern templates, dataflow anoma