feat: a sequential batch TSS keysign scheduler for EVM chain
Description
Remaining work:
- [ ] unit tests
This PR implements a sequential batch TSS keysign scheduler for EVM chain, improving outbound speed by 4~5X.
-
Decouple CCTX process goroutine
scheduleCCTXandTSSkeysign scheduler goroutinescheduleKeysign. -
Use an artificial (deterministic) height instead of real ZetaChain height to create TSS keysign request. This improves outbound performance by
2X. -
Schedule
TSSkeysign for batched digests instead of only one single digest. Reduced total keysign requests number from multiple to only one (per chain). -
Schedule
TSSkeysign by nonce (batched) sequentially without waiting intervals, replacing the existing interval based logiczeta_height % interval == cctx_nonce % interval. This improves outbound performance by2~3X
The eth withdraw stress test result before:
The result after:
Closes https://github.com/zeta-chain/node/issues/4436
How Has This Been Tested?
- [x] Tested CCTX in localnet
- [ ] Tested in development environment
- [ ] Go unit tests
- [ ] Go integration tests
- [ ] Tested via GitHub Actions
[!NOTE] Cursor Bugbot is generating a summary for commit cdf2b234e3dbbcebc789d58a88246ec767b31aaf. Configure here.
[!IMPORTANT]
Review skipped
Auto incremental reviews are disabled on this repository.
Please check the settings in the CodeRabbit UI or the
.coderabbit.yamlfile in this repository. To trigger a single review, invoke the@coderabbitai reviewcommand.You can disable this status message by setting the
reviews.review_statustofalsein the CodeRabbit configuration file.
📝 Walkthrough
Walkthrough
This pull request introduces a batched TSS keysign system to improve EVM chain outbound performance. It refactors signing workflows to use per-nonce digest caching, batch multiple keysigns into a single TSS operation, and eliminates height-based scheduling in favor of a nonce-driven approach with stale-block detection.
Changes
| Cohort / File(s) | Summary |
|---|---|
Cantor Pairing Utilities pkg/math/pairing.go, pkg/math/pairing_test.go |
Introduces Cantor pairing functions (CantorPair, CantorUnpair) and MaxPairValue constant for mapping uint32 pairs to uint64 values, with comprehensive round-trip testing. |
Base Signer Batch Infrastructure zetaclient/chains/base/signer.go, zetaclient/chains/base/signer_batch_info.go, zetaclient/chains/base/signer_batch_sign.go |
Adds per-nonce TSS tracking (tssKeysignInfoMap, nextTSSNonce), introduces TSSKeysignInfo and TSSKeysignBatch structures, implements batch accumulation logic, readiness checks, signing workflows, and nonce-to-batch mapping utilities. Changes mu from Mutex to RWMutex for concurrent access. |
EVM Signer Refactoring zetaclient/chains/evm/signer/outbound_data.go, zetaclient/chains/evm/signer/outbound_data_test.go, zetaclient/chains/evm/signer/sign.go, zetaclient/chains/evm/signer/sign_test.go, zetaclient/chains/evm/signer/signer.go, zetaclient/chains/evm/signer/signer_admin.go, zetaclient/chains/evm/signer/signer_admin_test.go, zetaclient/chains/evm/signer/signer_test.go, zetaclient/chains/evm/signer/v2_sign.go |
Removes height parameter from NewOutboundData and Sign operations. Replaces TSS-based signing with GetSignatureOrAddDigest flow, introducing ErrWaitForSignature for async keysign awaiting. Adds NextTSSNonce method. Updates test infrastructure with digest-based mocking and signature preloading for all signing paths. |
EVM Chain Scheduler Refactoring zetaclient/chains/evm/evm.go |
Introduces scheduleKeysign method with batch preparation, readiness checks, and sequential batch signing. Refactors scheduleCCTX to use NextTSSNonce instead of tracker-based nonce heuristics, adds stale-block-event skipping logic, simplifies conflict checking. Removes getTrackerSet helper and tracker-based gating. |
Client and Repository Interface Extensions zetaclient/chains/tssrepo/client.go, zetaclient/chains/zrepo/client.go, zetaclient/chains/zrepo/zrepo.go, zetaclient/dry/dry.go, zetaclient/testutils/mocks/tss.go, zetaclient/testutils/mocks/zetacore.go, zetaclient/mode/chaos/generated.go, zetaclient/tss/service.go |
Adds IsSignatureCached method to TSSClient interface and implementations (TSSService, mocks, chaos). Adds GetBlockHeight method to ZetacoreReaderClient interface and implementations (ZetaRepo, mocks, chaos). Updates mock parameter naming for clarity. |
Metrics and Monitoring zetaclient/metrics/metrics.go |
Introduces NextTSSNonce gauge metric (per-chain) for observability of TSS account nonce state. |
Test Performance and Configuration cmd/zetae2e/local/performance.go, zetaclient/mode/chaos/generate/sample.json |
Moves timer start in withdraw performance test to after deposit step, measuring only keysign execution time. Adds GetBlockHeight configuration entry to chaos generator. |
Documentation changelog.md |
Documents new batch keysign feature for EVM performance improvement. |
Observer Comments zetaclient/chains/evm/observer/outbound.go |
Adds explanatory comment about batch keysign usage and deprecation of continueKeysign flag. |
Sequence Diagram(s)
sequenceDiagram
participant Scheduler as EVM Scheduler
participant Signer as Base Signer
participant Batch as Batch Manager
participant TSS as TSS Service
participant Cache as Signature Cache
Scheduler->>Signer: PrepareForKeysign(zetaHeight, nextNonce)
Signer->>Signer: Check stale blocks
Signer->>Signer: Clean stale keysign info
Signer-->>Scheduler: Ready: bool
alt Batch ready to sign
Scheduler->>Signer: GetKeysignBatch(batchNumber)
Signer->>Batch: Collect digests in nonce range
Batch-->>Signer: TSSKeysignBatch
Signer->>Signer: SignBatch(batch)
Signer->>Signer: Compute keysignHeight via Cantor pairing
Signer->>TSS: SignBatch(digests, height)
TSS->>Cache: Store signatures
Signer->>Signer: AddBatchSignatures(batch, sigs)
else Waiting for signatures
Signer->>Cache: GetSignatureOrAddDigest(nonce, digest)
Cache-->>Signer: (sig [65]byte, found bool)
alt Found in cache
Signer-->>Scheduler: Success
else Not found
Signer-->>Scheduler: ErrWaitForSignature
end
end
sequenceDiagram
participant Client as EVM Client
participant Outbound as OutboundData
participant TSS as Signer (TSS)
participant Batch as Batch Signing
rect rgb(200, 200, 255)
note over Client,TSS: Old Flow: Height-based scheduling
Client->>Outbound: NewOutboundData(ctx, cctx, height, logger)
Outbound-->>Client: OutboundData with height field
Client->>TSS: Sign(ctx, data, ..., height)
TSS->>TSS: Per-nonce TSS keysign
end
rect rgb(200, 255, 200)
note over Client,Batch: New Flow: Batch and digest-based
Client->>Outbound: NewOutboundData(ctx, cctx, logger)
Outbound-->>Client: OutboundData (height from ObservedExternalHeight)
Client->>TSS: GetSignatureOrAddDigest(nonce, digest)
alt Signature cached
TSS-->>Client: (sig, true)
else Waiting on batch
TSS-->>Client: (empty, false)
Client->>Batch: PrepareForKeysign()
Batch->>Batch: Accumulate nonces
Batch->>TSS: SignBatch(digests[])
TSS->>TSS: Single TSS keysign for batch
Batch->>Batch: Cache all signatures
end
end
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~60 minutes
Areas requiring extra attention:
- Concurrency safeguards in batch accumulation: The new RWMutex usage in Signer and per-nonce map (tssKeysignInfoMap) require careful review to ensure lock ordering and prevent deadlocks during concurrent batch operations.
- Cantor pairing correctness: The bidirectional mapping (NonceToBatchNumber, BatchNumberToRange) and KeysignHeight computation using Cantor pairing must be validated for round-trip correctness and absence of collisions.
- Scheduler control flow changes: The replacement of tracker-based nonce heuristics with NextTSSNonce and introduction of stale-block-event detection fundamentally alters CCTX processing order and timing. Verify this does not break existing ordering guarantees or introduce race conditions.
- Asynchronous signature awaiting: The new ErrWaitForSignature signal and GetSignatureOrAddDigest flow introduce eventual-consistency semantics. Verify retry logic, backpressure handling, and that outbounds are not silently dropped or duplicated.
- Test infrastructure alignment: Digest-based mocking across multiple signing paths (sign_test.go, signer_admin_test.go, signer_test.go) must be consistent to avoid false negatives masking real signing failures.
- Interface compliance: New interface methods (IsSignatureCached, GetBlockHeight) added to TSSClient and ZetacoreReaderClient require verification that all implementations (service, mocks, chaos, dry) are correctly updated.
Possibly related PRs
- zeta-chain/node#2357: Involves base Signer and batch signing surface changes, including integration of signer fields and batch SignBatch method introduction with related refactoring.
Suggested reviewers
- skosito
- lumtis
- brewmaster012
- kingpinXD
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 29.41% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
✅ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title accurately summarizes the main architectural change: implementing a sequential batch TSS keysign scheduler for EVM chains, which is the primary objective of the changeset. |
| Description check | ✅ Passed | The description is comprehensive and covers the four key architectural changes, performance improvements, test results, and linked issue. However, testing coverage is incomplete: only localnet testing is checked while Go unit/integration tests and GitHub Actions are unchecked. |
| Linked Issues check | ✅ Passed | The changeset directly addresses issue #4436 by implementing batched keysign scheduling and deterministic height computation to reduce excessive TSS requests and improve throughput consistency. |
| Out of Scope Changes check | ✅ Passed | All changes remain within scope: EVM batch keysign implementation, supporting utilities (Cantor pairing, batch info structures), interface extensions (IsSignatureCached, GetBlockHeight), and related test updates. The changelog entry is documentation. |
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
!!!WARNING!!!
nosec detected in the following files: pkg/math/pairing.go, zetaclient/chains/base/signer_batch_info.go, zetaclient/chains/base/signer_batch_info_test.go, zetaclient/chains/base/signer_batch_sign.go, zetaclient/chains/base/signer_batch_sign_test.go, zetaclient/chains/evm/evm.go, zetaclient/tss/service.go
Be very careful about using #nosec in code. It can be a quick way to suppress security warnings and move forward with development, it should be employed with caution. Suppressing warnings with #nosec can hide potentially serious vulnerabilities. Only use #nosec when you're absolutely certain that the security issue is either a false positive or has been mitigated in another way.
Only suppress a single rule (or a specific set of rules) within a section of code, while continuing to scan for other problems. To do this, you can list the rule(s) to be suppressed within the #nosec annotation, e.g: /* #nosec G401 */ or //#nosec G201 G202 G203
Broad #nosec annotations should be avoided, as they can hide other vulnerabilities. The CI will block you from merging this PR until you remove #nosec annotations that do not target specific rules.
Pay extra attention to the way #nosec is being used in the files listed above.
Codecov Report
:x: Patch coverage is 70.63712% with 106 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 64.85%. Comparing base (d8a6ec2) to head (afae2de).
:warning: Report is 1 commits behind head on develop.
Additional details and impacted files
@@ Coverage Diff @@
## develop #4427 +/- ##
===========================================
+ Coverage 64.71% 64.85% +0.14%
===========================================
Files 469 472 +3
Lines 28574 28797 +223
===========================================
+ Hits 18491 18677 +186
- Misses 9064 9096 +32
- Partials 1019 1024 +5
| Files with missing lines | Coverage Δ | |
|---|---|---|
| pkg/math/pairing.go | 100.00% <100.00%> (ø) |
|
| pkg/scheduler/context.go | 22.72% <100.00%> (ø) |
|
| pkg/scheduler/tickers.go | 81.96% <100.00%> (ø) |
|
| zetaclient/chains/evm/observer/outbound.go | 63.28% <ø> (ø) |
|
| zetaclient/chains/evm/signer/outbound_data.go | 66.01% <100.00%> (-0.33%) |
:arrow_down: |
| zetaclient/chains/evm/signer/signer_admin.go | 83.69% <100.00%> (-1.31%) |
:arrow_down: |
| zetaclient/metrics/metrics.go | 68.08% <ø> (ø) |
|
| zetaclient/tss/service.go | 50.26% <100.00%> (+1.93%) |
:arrow_up: |
| zetaclient/chains/evm/signer/sign.go | 65.04% <75.00%> (-0.41%) |
:arrow_down: |
| zetaclient/chains/zrepo/zrepo.go | 42.01% <0.00%> (-0.51%) |
:arrow_down: |
| ... and 9 more |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.