The Benchmark

The Comprehensive Verilog Design Problems (CVDP) benchmark is the field's most rigorous evaluation of RTL automation. It tests systems across four task categories — Code Completion, Spec-to-RTL, Code Modification, and Code Debug — spanning a wide range of problem difficulty. We evaluate against ACE-RTL, the current state-of-the-art specialized system, across seven generations of agent development (Gen 0 through Gen 6).

Results

Agentrys reaches 95.8% overall pass rate at Gen 6 — surpassing ACE-RTL's 88.9% by 6.9 points. On Code Debug, the system achieves a perfect 100% pass rate. Performance improves almost monotonically across generations, validating the self-improving loop at the core of the ADA framework.

CVDP pass rate across seven agent generations vs. ACE-RTL SOTA and Claude Opus 4.5 baseline
Fig. 1 — Pass rate (%) across Gen 0 – Gen 6 on CVDP. Agentrys (blue) shows consistent improvement across all task categories. ACE-RTL SOTA (orange dashed) is the fixed specialized baseline; Claude Opus 4.5 (green circle) is the zero-shot LLM baseline.
Task Agentrys (Gen 6) ACE-RTL (SOTA) Claude Opus 4.5
Overall 95.8% 88.9% 50.1%
Code Completion 96.8% 80.8% 42.4%
Spec-to-RTL 96.2% 96.2% 54.3%
Code Modification 90.9% 90.9% 52.1%
Code Debug 100.0% 91.4% 57.3%

The sharpest gains appear on tasks that demand multi-step reasoning and tool use. On Code Completion, Agentrys leads ACE-RTL by 16 points; on Code Debug, it reaches a perfect score — 8.6 points ahead of SOTA — where agentic iteration over simulation feedback provides the clearest advantage over static, non-improving baselines.

Agent Architecture Evolution

The gains across generations are driven by a systematic evolution of all aspects in agent design, including architecture — from a single-agent system at Gen 0 to a 16-agent hierarchical network at Gen 2. Each generation introduces new coordination mechanisms that increase both the breadth of solution exploration and the rigor of verification.

Agent architecture evolution: Gen 0 single-agent, Gen 1 multi-agent system, Gen 2 hierarchical agent network
Fig. 2 — Three generations of agent architecture on CVDP, from a single agent to a 16-agent hierarchical network with communication, debate, and adversarial verification.
Gen 0
Single-Agent
1 agent · Tools · Skills
A single agent handles the full RTL simulation and debug loop end-to-end, reading the spec, writing RTL, simulating, and iterating on failures.
Read Spec Write RTL Simulate Debug Loop
Gen 1
Multi-Agent System
5 agents
An orchestrator coordinates three parallel RTL designers and a reviewer, enabling concurrent solution exploration with quality-gated aggregation.
Orchestrator RTL Designer ×3 Reviewer
Gen 2
Hierarchical Agent Network
16 agents
Three independent solver teams compete and debate. An evidence-weighted aggregator and adversarial verifier resolve disagreements before a simulation specialist validates the final output.
Orchestrator Solver Team ×3 Aggregator Adversarial Verifier Sim Specialist

The Gen 2 hierarchical network adds a communication and debate layer: after each solver team independently produces a candidate solution, an evidence-weighted aggregator reconciles the outputs and an adversarial verifier stress-tests the result before a simulation specialist validates it. This pipeline reduces both false positives and convergence failures. Each generation is itself a product of the self-improving loop — built from what the previous generation learned running real design workflows.

References

Cite this work

@misc{agentrys2026cvdp,
  title   = {Agentrys: Self-improving Agent Solving CVDP RTL Coding Tasks},
  author  = {Tsai, Yun-Da and Ding, Duo and Li, Wuxi and Ren, Haoxing},
  year    = {2026},
  month   = {February},
  url     = {https://agentrys.ai/blog-cvdp.html}
}