zk-regex icon indicating copy to clipboard operation
zk-regex copied to clipboard

Feat/new compiler

Open shreyas-londhe opened this issue 8 months ago • 4 comments

[!NOTE] Replaces the old Rust compiler and extensive Circom tests with a new Bun/TypeScript scripts suite (gen-regex, gen-inputs), shared utilities, and Jest-based tests.

  • Tooling/Scripts (new):
    • Add Bun/TypeScript scripts for generating regex artifacts: circom/scripts/gen-regex.ts, noir/scripts/gen-regex.ts, noir/scripts/gen-inputs.ts.
    • Introduce shared utilities (utils/*) for logging, file ops, subprocess, and types.
    • Add project config: tsconfig.json, jest.config.ts, jest.setup.ts, bun.lock, and package.json.
  • Tests (new):
    • Add Jest integration and unit tests under scripts/__tests__ for script workflows and utilities.
  • Compiler (removed):
    • Remove legacy Rust compiler package (packages/compiler/*).
  • Circom Tests (removed):
    • Remove numerous legacy Circom test circuits and test cases under packages/circom/tests/*.

Written by Cursor Bugbot for commit 3d37c3110fbd1baa96d048ab658bcb47acd8731b. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • New Features
    • Adds many new regex verification circuits for email headers and addresses and a compiler to generate them; Noir support added alongside Circom.
  • Infrastructure & Build
    • New CI/workflow and pre-push hook to keep generated templates in sync; new package/tsconfig/bundler configs for tooling.
  • Documentation
    • Major README overhaul and new docs for compiler, Circom, and Noir subprojects with usage and troubleshooting.
  • Tests
    • End-to-end circuit tests and sample inputs added for common email regexes.
  • Chores
    • Simplified lint/format configs and updated workspace/manifest and .gitignore rules.

shreyas-londhe avatar Mar 28 '25 15:03 shreyas-londhe

@cursor please review this

BornPsych avatar Sep 30 '25 15:09 BornPsych

I've thoroughly reviewed the "Feat/new compiler" PR. I started by exploring the codebase, examining package.json files for dependency changes, and then delved into the compiler's structure, error handling, and test files. I also paid close attention to the Socket security report and the Cargo.toml files for Rust dependencies.

Here's a summary of my findings:

This PR introduces a substantial new ZK-Regex compiler with a well-structured architecture, robust error handling, and support for multiple proving frameworks (Circom and Noir). Key strengths include:

  • Architecture & Design: Modular structure, clear API, and support for multiple frameworks.
  • Error Handling: Comprehensive error taxonomy with user-friendly messages.
  • Developer Experience: Excellent CLI tool, automatic template regeneration, and good documentation.
  • Code Quality: Follows Rust best practices with clean abstractions.

However, I identified several areas for improvement and potential issues:

  • Security Concerns: The Socket report highlighted the addition of @zk-email/[email protected] and an upgrade of circom_tester. Using alpha versions in production is a risk.
  • Test Coverage: I found no unit tests in the Rust compiler code, and integration tests appear limited.
  • Dependencies: There's a dependency on a specific GitHub branch (comptime from feat/comptime-codegen), which could lead to instability.
  • Documentation: Some inline documentation is missing, and there's no architecture diagram.

My main recommendations are to:

  • Add comprehensive unit tests for core compiler functionality.
  • Replace alpha dependencies with stable versions.
  • Address the external GitHub branch dependency.
  • Add performance benchmarks.

Overall, this is a high-quality PR, but the critical issues regarding tests and dependencies should be addressed before merging.


5 tasks completed

  • [x] Review PR changes for new compiler feature
  • [x] Analyze new dependencies added (@types/bun, @zk-email/circuits)
  • [x] Review Socket security report findings
  • [x] Verify new compiler implementation and integration
  • [x] Provide comprehensive review feedback Open in Cursor Open in Web

Learn more about Cursor Agents

cursor[bot] avatar Sep 30 '25 15:09 cursor[bot]

[!NOTE]

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a Rust-based zk-regex compiler, backend generators for Circom and Noir, IR and pass pipelines, wasm bindings, Circom/Noir artifacts and tests, developer tooling (scripts, hooks, CI), many generated Circom circuits/graphs/regexes, and project configuration/file removals and reorganizations.

Changes

Cohort / File(s) Summary
Removed configs
\.cargo/config``, \.eslintrc.json``, \.eslintignore``, babel.config.js, \.prettierrc
Deleted legacy tooling/config files (cargo, ESLint, Babel, Prettier)
Repo config & ignores
\Cargo.toml``, \.gitignore``, `bunfig.toml``
Reworked workspace Cargo manifest, updated .gitignore entries, added Bun test timeout config
CI workflows
\.github/workflows/test.yaml, \.github/workflows/test.yml
Removed old workflow and added new Rust/Bun-based GitHub Actions workflow (Circom install, Bun runner, tests)
Root docs & metadata
\README.md``, \LICENSE``
Large README rewrite and minor license text tweak
Compiler crate & packaging
compiler/Cargo.toml, compiler/package.json, compiler/README.md
New compiler crate manifest, npm package metadata, and compiler README
Compiler library & API
compiler/src/lib.rs, compiler/src/driver.rs, compiler/src/error.rs, compiler/src/types.rs, compiler/src/wasm.rs, compiler/src/utils.rs
New public compiler API, structured errors, wasm bindings, utilities for composing/generating outputs
IR: NFA & intermediate
compiler/src/ir/* (graph.rs, intermediate.rs, nfa.rs, mod.rs)
New IR: NFANode/NFAGraph, IntermediateNFA, epsilon-elimination, path-finding, verification and serialization
Passes & errors
compiler/src/passes/* (builder.rs, error.rs, mod.rs)
New builder pass converting Thompson NFA → Intermediate → NFAGraph and NFA error types mapped to CompilerError
Backend generators
compiler/src/backend/* (mod.rs, circom.rs, noir.rs, shared.rs)
Circom and Noir code generators and shared extraction logic for transitions/captures; Circom code emitter API
CLI binary
compiler/src/bin/zk-regex.rs
New CLI with subcommands: Decomposed, Raw, GenerateCircuitInput
Gen / scripting
circom/scripts/gen-regex.ts, git-hooks/pre-push
Script to generate Circom circuits from JSON using the compiler; pre-push hook enforcing template regeneration
Circom package & configs
circom/package.json, circom/tsconfig.json, circom/bunfig.toml, circom/index.ts, circom/.gitignore
New Circom package manifest, TS config, Bun config, entry log, and Circom-specific .gitignore
Circom regex helpers & templates
circom/circuits/regex_helpers.circom, circom/circuits/common/*.circom
Added MultiOR, transition checks, CaptureSubstring and many generated Circom templates implementing DFA-based regexes (email, subject, body_hash, message_id, to/from, simple, reversed_bracket, etc.)
Circom DFA graphs & regexes
circom/circuits/common/*_graph.json, circom/regexes/*.json
Added generated DFA JSON graphs and source regex JSON descriptors for each pattern
Circom tests
circom/circuits/tests/*, circom/circuits/tests/*.test.ts
Added Circom test circuits and TypeScript test suites exercising generated circuits (body_hash, email_addr, subject_all, etc.)
Noir integration & samples
noir/Nargo.toml, noir/README.md, noir/common/*.json, noir/common/sample_haystacks/**
Added Noir project manifest, docs, pattern JSONs, and sample haystacks / circuit input traces
Repo tooling & docs
CLAUDE.md, circom/README.md
Added developer documentation and Circom-specific README

Sequence Diagram(s)

%%{init: { "themeVariables": { "actorBkg": "#E8F6FF", "actorBorder": "#8CCBEA", "noteBkg": "#F6F8FA" }}}%%
sequenceDiagram
    participant User
    participant CLI as zk-regex CLI
    participant Driver
    participant Builder as NFABuilder
    participant Backend as CodeGen
    participant Filesystem

    User->>CLI: gen-from-raw(pattern, template, framework)
    CLI->>Driver: Driver::compile(pattern, config)
    Driver->>Builder: NFAGraph::build(pattern)
    Builder->>Builder: Thompson NFA → IntermediateNFA → NFAGraph
    Builder->>Driver: NFAGraph (graph JSON)
    Driver->>Backend: generate_code(nfa, template, pattern, max_bytes)
    Backend->>Filesystem: emit Circom/Noir code & graph
    Filesystem-->>User: (graph.json, template.circom / template.nr)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas requiring extra attention:

  • compiler/src/ir/* (IntermediateNFA, epsilon elimination, path-finding)
  • compiler/src/backend/* (Circom/Noir emission correctness, capture handling)
  • compiler/src/wasm.rs (error translation & JS bindings)
  • circom/circuits/common/*_regex.circom (per-byte transition logic and captures)
  • git-hooks/pre-push (stashing, build-and-regenerate flow)

Possibly related PRs

  • zkemail/zk-regex#110 — modifies epsilon-closure/capture handling in compiler IR; strongly related to intermediate/NFA capture semantics.

Suggested reviewers

  • BornPsych

Poem

🐇 In burrows of bytes I nimbly dig,

Patterns to circuits, neat and big.
States hop, captures snug and warm,
Proofs bloom from regex into storm —
Hooray! A compiled trail, a rabbit's jig. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Feat/new compiler' is vague and does not clearly convey the main changes in the PR. While it mentions a compiler feature, it fails to specify that this replaces the old compiler, introduces Bun/TypeScript scripts, and significantly restructures the tooling. Consider a more descriptive title such as 'Rewrite compiler in Rust with Bun/TypeScript tooling' or 'Replace legacy compiler with new Rust-based architecture' to better communicate the scope and nature of the changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment
  • [ ] Commit unit tests in branch feat/new-compiler

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 31 '25 09:10 coderabbitai[bot]