zk-regex
zk-regex copied to clipboard
Feat/new compiler
[!NOTE] Replaces the old Rust compiler and extensive Circom tests with a new Bun/TypeScript scripts suite (gen-regex, gen-inputs), shared utilities, and Jest-based tests.
- Tooling/Scripts (new):
- Add Bun/TypeScript scripts for generating regex artifacts:
circom/scripts/gen-regex.ts,noir/scripts/gen-regex.ts,noir/scripts/gen-inputs.ts.- Introduce shared utilities (
utils/*) for logging, file ops, subprocess, and types.- Add project config:
tsconfig.json,jest.config.ts,jest.setup.ts,bun.lock, andpackage.json.- Tests (new):
- Add Jest integration and unit tests under
scripts/__tests__for script workflows and utilities.- Compiler (removed):
- Remove legacy Rust compiler package (
packages/compiler/*).- Circom Tests (removed):
- Remove numerous legacy Circom test circuits and test cases under
packages/circom/tests/*.Written by Cursor Bugbot for commit 3d37c3110fbd1baa96d048ab658bcb47acd8731b. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
- New Features
- Adds many new regex verification circuits for email headers and addresses and a compiler to generate them; Noir support added alongside Circom.
- Infrastructure & Build
- New CI/workflow and pre-push hook to keep generated templates in sync; new package/tsconfig/bundler configs for tooling.
- Documentation
- Major README overhaul and new docs for compiler, Circom, and Noir subprojects with usage and troubleshooting.
- Tests
- End-to-end circuit tests and sample inputs added for common email regexes.
- Chores
- Simplified lint/format configs and updated workspace/manifest and .gitignore rules.
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
@cursor please review this
I've thoroughly reviewed the "Feat/new compiler" PR. I started by exploring the codebase, examining package.json files for dependency changes, and then delved into the compiler's structure, error handling, and test files. I also paid close attention to the Socket security report and the Cargo.toml files for Rust dependencies.
Here's a summary of my findings:
This PR introduces a substantial new ZK-Regex compiler with a well-structured architecture, robust error handling, and support for multiple proving frameworks (Circom and Noir). Key strengths include:
- Architecture & Design: Modular structure, clear API, and support for multiple frameworks.
- Error Handling: Comprehensive error taxonomy with user-friendly messages.
- Developer Experience: Excellent CLI tool, automatic template regeneration, and good documentation.
- Code Quality: Follows Rust best practices with clean abstractions.
However, I identified several areas for improvement and potential issues:
- Security Concerns: The Socket report highlighted the addition of
@zk-email/[email protected]and an upgrade ofcircom_tester. Using alpha versions in production is a risk. - Test Coverage: I found no unit tests in the Rust compiler code, and integration tests appear limited.
- Dependencies: There's a dependency on a specific GitHub branch (
comptimefromfeat/comptime-codegen), which could lead to instability. - Documentation: Some inline documentation is missing, and there's no architecture diagram.
My main recommendations are to:
- Add comprehensive unit tests for core compiler functionality.
- Replace alpha dependencies with stable versions.
- Address the external GitHub branch dependency.
- Add performance benchmarks.
Overall, this is a high-quality PR, but the critical issues regarding tests and dependencies should be addressed before merging.
5 tasks completed
- [x] Review PR changes for new compiler feature
- [x] Analyze new dependencies added (@types/bun, @zk-email/circuits)
- [x] Review Socket security report findings
- [x] Verify new compiler implementation and integration
- [x] Provide comprehensive review feedback
Learn more about Cursor Agents
[!NOTE]
Other AI code review bot(s) detected
CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.
Walkthrough
Adds a Rust-based zk-regex compiler, backend generators for Circom and Noir, IR and pass pipelines, wasm bindings, Circom/Noir artifacts and tests, developer tooling (scripts, hooks, CI), many generated Circom circuits/graphs/regexes, and project configuration/file removals and reorganizations.
Changes
| Cohort / File(s) | Summary |
|---|---|
Removed configs \.cargo/config``, \.eslintrc.json``, \.eslintignore``, babel.config.js, \.prettierrc |
Deleted legacy tooling/config files (cargo, ESLint, Babel, Prettier) |
Repo config & ignores \Cargo.toml``, \.gitignore``, `bunfig.toml`` |
Reworked workspace Cargo manifest, updated .gitignore entries, added Bun test timeout config |
CI workflows \.github/workflows/test.yaml, \.github/workflows/test.yml |
Removed old workflow and added new Rust/Bun-based GitHub Actions workflow (Circom install, Bun runner, tests) |
Root docs & metadata \README.md``, \LICENSE`` |
Large README rewrite and minor license text tweak |
Compiler crate & packaging compiler/Cargo.toml, compiler/package.json, compiler/README.md |
New compiler crate manifest, npm package metadata, and compiler README |
Compiler library & API compiler/src/lib.rs, compiler/src/driver.rs, compiler/src/error.rs, compiler/src/types.rs, compiler/src/wasm.rs, compiler/src/utils.rs |
New public compiler API, structured errors, wasm bindings, utilities for composing/generating outputs |
IR: NFA & intermediate compiler/src/ir/* (graph.rs, intermediate.rs, nfa.rs, mod.rs) |
New IR: NFANode/NFAGraph, IntermediateNFA, epsilon-elimination, path-finding, verification and serialization |
Passes & errors compiler/src/passes/* (builder.rs, error.rs, mod.rs) |
New builder pass converting Thompson NFA → Intermediate → NFAGraph and NFA error types mapped to CompilerError |
Backend generators compiler/src/backend/* (mod.rs, circom.rs, noir.rs, shared.rs) |
Circom and Noir code generators and shared extraction logic for transitions/captures; Circom code emitter API |
CLI binary compiler/src/bin/zk-regex.rs |
New CLI with subcommands: Decomposed, Raw, GenerateCircuitInput |
Gen / scripting circom/scripts/gen-regex.ts, git-hooks/pre-push |
Script to generate Circom circuits from JSON using the compiler; pre-push hook enforcing template regeneration |
Circom package & configs circom/package.json, circom/tsconfig.json, circom/bunfig.toml, circom/index.ts, circom/.gitignore |
New Circom package manifest, TS config, Bun config, entry log, and Circom-specific .gitignore |
Circom regex helpers & templates circom/circuits/regex_helpers.circom, circom/circuits/common/*.circom |
Added MultiOR, transition checks, CaptureSubstring and many generated Circom templates implementing DFA-based regexes (email, subject, body_hash, message_id, to/from, simple, reversed_bracket, etc.) |
Circom DFA graphs & regexes circom/circuits/common/*_graph.json, circom/regexes/*.json |
Added generated DFA JSON graphs and source regex JSON descriptors for each pattern |
Circom tests circom/circuits/tests/*, circom/circuits/tests/*.test.ts |
Added Circom test circuits and TypeScript test suites exercising generated circuits (body_hash, email_addr, subject_all, etc.) |
Noir integration & samples noir/Nargo.toml, noir/README.md, noir/common/*.json, noir/common/sample_haystacks/** |
Added Noir project manifest, docs, pattern JSONs, and sample haystacks / circuit input traces |
Repo tooling & docs CLAUDE.md, circom/README.md |
Added developer documentation and Circom-specific README |
Sequence Diagram(s)
%%{init: { "themeVariables": { "actorBkg": "#E8F6FF", "actorBorder": "#8CCBEA", "noteBkg": "#F6F8FA" }}}%%
sequenceDiagram
participant User
participant CLI as zk-regex CLI
participant Driver
participant Builder as NFABuilder
participant Backend as CodeGen
participant Filesystem
User->>CLI: gen-from-raw(pattern, template, framework)
CLI->>Driver: Driver::compile(pattern, config)
Driver->>Builder: NFAGraph::build(pattern)
Builder->>Builder: Thompson NFA → IntermediateNFA → NFAGraph
Builder->>Driver: NFAGraph (graph JSON)
Driver->>Backend: generate_code(nfa, template, pattern, max_bytes)
Backend->>Filesystem: emit Circom/Noir code & graph
Filesystem-->>User: (graph.json, template.circom / template.nr)
Estimated code review effort
🎯 5 (Critical) | ⏱️ ~120 minutes
Areas requiring extra attention:
- compiler/src/ir/* (IntermediateNFA, epsilon elimination, path-finding)
- compiler/src/backend/* (Circom/Noir emission correctness, capture handling)
- compiler/src/wasm.rs (error translation & JS bindings)
- circom/circuits/common/*_regex.circom (per-byte transition logic and captures)
- git-hooks/pre-push (stashing, build-and-regenerate flow)
Possibly related PRs
- zkemail/zk-regex#110 — modifies epsilon-closure/capture handling in compiler IR; strongly related to intermediate/NFA capture semantics.
Suggested reviewers
- BornPsych
Poem
🐇 In burrows of bytes I nimbly dig,
Patterns to circuits, neat and big.
States hop, captures snug and warm,
Proofs bloom from regex into storm —
Hooray! A compiled trail, a rabbit's jig. 🥕
Pre-merge checks and finishing touches
❌ Failed checks (1 inconclusive)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Title check | ❓ Inconclusive | The title 'Feat/new compiler' is vague and does not clearly convey the main changes in the PR. While it mentions a compiler feature, it fails to specify that this replaces the old compiler, introduces Bun/TypeScript scripts, and significantly restructures the tooling. | Consider a more descriptive title such as 'Rewrite compiler in Rust with Bun/TypeScript tooling' or 'Replace legacy compiler with new Rust-based architecture' to better communicate the scope and nature of the changes. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Docstring Coverage | ✅ Passed | Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
feat/new-compiler
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.