cardano-ledger Implementation-Independent Ledger Conformance Test Suite

Motivation

The availability of Plutus Conformance Test Suite has fostered the development of numerous Plutus implementations in Rust, Python, JavaScript, Go, and C++. These implementations support diverse community use cases and enable experimentation with various aspects of Plutus, including performance.

The community is actively working on Cardano Ledger implementations in Rust, Go, and C++. However, without a high-quality, implementation-independent test suite like the one available for Plutus, each implementation must rely on its own custom tests. This increases the risk of non-conformant implementations being actively used simply due to the lack of a universally available conformance test suite. Furthermore, a language-independent test suite would facilitate collaboration, allowing contributions from any team to benefit all implementations.

The task consists of two stages:

Alignment: Discuss and define the requirements and general approach for an implementation-independent test suite.
Development: Build the test suite, with the the C++ implementation committing a share of the necessary resources to support its development.

More details follow. Feedback and suggestions are welcome.

Requirements

The test suite must be language- and environment-agnostic.
It must be reasonably simple to use in major programming languages.
It must follow a black-box approach, evaluating outputs based on given inputs, to enable experimentation with alternative algorithms such as batching and parallelization.
It should support parallel execution of individual tests to leverage modern hardware and shorten development cycles.

Assumptions Driving the Proposed Approach

The initial proposed approach is based on the following assumptions:

Every ledger implementation requires a functional CBOR decoder and corresponding parsers for Cardano block formats.
A ledger implementation can be fully modeled as a black box that processes a sequence of blocks and produces a new ledger state.
A common approach to testing a new ledger implementation is to process blocks from Cardano’s mainnet and testnets. The results are then compared against those produced by the reference implementation (Cardano Node). Therefore, it should be possible to leverage some of those data when creating the test suite.
The formal specification of the Cardano Ledger serves as the authoritative source for ledger rules and should be used to generate a comprehensive set of test cases. However, automatic source code generation from the formal specification is limited to only a few programming languages, thereby affecting the test suite’s primary objective: implementation independence.

Proposed approach

Test case inputs should use the standard CBOR format used for storing blocks on the Cardano mainnet.
Test case outputs should use the state snapshot format from the latest stable version of the Cardano Node, which is also CBOR-encoded. This allows for easy benchmarking against the reference implementation. Additionally, if the snapshot format changes, regenerating outputs is straightforward, as test case inputs remain standard Cardano blocks.
Each test case consists of an initial ledger state (including the genesis configuration), a sequence of blocks, and a final ledger state.
Initial test cases can be created by sampling blocks from the Cardano mainnet and modifying them as needed to ensure consistency of generated sequences with ledger specification rules.
A comprehensive test set can be generated either programmatically, using a block generator based on the formal ledger specification, or manually, depending on cost-effectiveness.

Feb 16 '25 22:02 sierkov

Hey @sierkov, we're fully onboard with this idea as part of the Amaru project and have even already started working towards it. So let me outline a few of the ongoing efforts:

I - Simulation testing

Inspired by the Maelstrom project from Jepsen, we're working on a simulation engine for node-to-node interactions that is (unlike Jepsen's): deterministic and language-agnostic. The idea is to leverage message-passing between processes and simulate faults between them.

This effort is particularly focused on the overall consensus behavior of nodes, with the hope to offer a validation bench for the Ouroboros protocols as a whole. See more under pragma-org/simulation-testing.

II - Ledger snapshots

To test conformance of the Amaru ledger against the Haskell's ledger, we compare epoch snapshots produced from both ends. We currently roll with two snapshots:

Stake distribution snapshots: contains account balances per stake credentials, as well as their delegation. It also contains pools specific information such as their protocol parameters, block produced during the epoch and overall active stake. See this example.
Rewards summary snapshots: contains information more specific to the rewards calculation, such as the treasury and reserves amount during the epoch, or the pools global efficiency (eta), as well as the values of the rewards pot and leader rewards for each pool. See another example here

As you can see, those snapshot are in JSON, mainly for two reasons:

They're easier to format this way, and compare as diff for testing. Performances isn't much of a concern here since they're only serialized in this way for testing.
It allows to more easily document them (using JSON-schema and the myriad of visualisation tools that come with them).

We are yet to document the steps to produce them from a Haskell node but in brief, we leverage the mini-protocols and dump the NewEpochState at specific point from which we then slightly re-arranged the data into a more suitable format.

III - Test vectors

Overall, we're also often taking specific CBOR-serialised blocks from mainnet, preprod or preview as easy source of validations. This strategy is however sub-optimal at this point because: (a) we do not yet measure coverage, so there's little guarantee that we actually cover the entire serialiser and (b) it only ever puts the SUT in front of valid data, and never in front of potentially problematic one.

So, we are also when necessary generating test vectors from the Haskell codebase containing both valid and invalid scenarios. This has been the case, for example, for block headers which we used to validate both the deserialisation parts as well as the KES / VRF primitives with specific adversarial mutations.

All-in-all, as you can see, we try to keep those things as language-agnostic as possible, with the idea to upstream some of them to the Cardano blueprint led by @ch1bo.

Feb 18 '25 08:02 KtorZ

In principle it shouldn't be too difficult to turn the formal specification into one that can operate on CBOR serialized data. We already have a translation for most datatypes that appear in blocks and states between Haskell and Agda, and obviously the ledger can deserialize CBOR into the Haskell types. So just generating a trace checker out of this should be quite doable in not too much time. I'd assume at least half of the effort would just be getting all the build infrastructure going. This is actually very closely related to a project that I'd really like to do at some point, which is to verify mainnet using the Agda formalization. There are multiple ways to do that, but this would be one.

The bigger problems with this approach would be completeness (there are some things missing from the ledger formalization) and how to actually generate good traces. Completeness is obviously quite high on our priority list but it will take a while. For generating good traces, you could probably also reuse some of the ledger infrastructure that exists, but I don't know how difficult that would be.

Feb 18 '25 16:02 WhatisRT

@WhatisRT it shouldn't be too difficult to turn the formal specification into one that can operate on CBOR serialized data

You mean to say that you are volunteering for this task 😶 ?

The bigger problems with this approach would be completeness

For lack of better tools, code coverage can at least give some answer (although it will only highlight the obvious non-covered path). We can at least start with that?

Feb 18 '25 16:02 KtorZ

You mean to say that you are volunteering for this task 😶 ?

I don't really have time to work on this myself (too busy with Leios), but if there is an ask from the community I'm sure it can be prioritized.

For lack of better tools, code coverage can at least give some answer (although it will only highlight the obvious non-covered path). We can at least start with that?

Absolutely! What we have right now works quite well for conformance testing the implementation, and of course completeness is exactly the same problem there.

Feb 18 '25 20:02 WhatisRT

Thanks for opening this discussion @sierkov and for pointing me to it @KtorZ.

You mean to say that you are volunteering for this task 😶 ?

I don't really have time to work on this myself (too busy with Leios), but if there is an ask from the community I'm sure it can be prioritized

@WhatisRT @KtorZ @sierkov I would be interested to take a stab at and have capacity in course of the cardano-blueprint initiative, if you confirm something like the following would be used by you?

I do understand this would roughly mean:

Generalize conformance tests (bi-simulation) of cardano-ledger and ledger-formal-specifications such that it can be used by by @KtorZ and @sierkov from their respective Rust and C++ environments
- Executable-in-the-loop?
- Compile Rust/C++ with C bindings into a test driver?
Only operates on the outermost level, i.e. LEDGER rule that takes a ledger state and transition is a Tx
- This is quite black-boxy, but maybe enough for a first prototype
- What format should the LedgerState have to be implementation independent?
Ability to run conformance tests against
- Generated CBOR using generators of cardano-ledger - currently done in their tests?
- Historic chain data - which data and format to use?

I would love to have input from @lehins on these things too.

Feb 19 '25 09:02 ch1bo

Only operates on the outermost level, i.e. LEDGER rule that takes a ledger state and transition is a Tx

This is quite black-boxy, but maybe enough for a first prototype

I'd at least put something like TICK in an MVP as well, so you can test going across the epoch boundary.

In principle it shouldn't be too difficult to provide tests for other subsystems as well, but their usefulness may depend on implementation details. We've also had this problem when conformance testing the Haskell implementation against the spec, where certain things happen at different places of the hierarchy. To deal with this, we've provided a 'Conformance' version of the spec, which has an equivalence proof to the actual spec and aligns better with the implementation. We could do something similar for other implementations that want to structure their logic differently, but that's a relatively big piece of work.

Feb 19 '25 13:02 WhatisRT

@KtorZ, @WhatisRT, @ch1bo, thank you very much for the additional context and ideas. Given the many points raised, I’ve structured my response into the following sections for clarity:

Project context – An overview of the C++ ledger implementation's testing and areas we’d like to address with this suite.
Data format – The benefits of using blocks as inputs, complete ledger state as outputs, and a story why the project transitioned from JSON to CBOR ledger snapshots.
Alignment process – A proposal for aligning on requirements and approach moving forward.
Additional thoughts – Comments on discussed topics not covered above.

TL;DR The most important section is 'Alignment Process,' particularly the debate over design alternatives and the confirmation of primary areas for initial exploration. I look forward to your feedback.

Project context

The C++ implementation originates from research on parallelizing the most time-consuming operations in Cardano Node, with a focus on batch synchronization (e.g., processing more than one epoch, ~20k+ blocks at a time). The initial goal was to demonstrate that, on mainnet data, it could leverage more powerful hardware to produce identical outputs significantly faster (targeting a 10x speedup).

To verify that the implementation produces the same ledger state, we follow this procedure:

A Cardano Node instance runs in a virtual container without Internet access.
A script modifies ImmutableDB files to provide one epoch of historical data at a time.
The node is restarted, processes the new data, and generates an updated state snapshot.
The snapshots—captured at the last slot of each complete epoch—are stored. ~600GB in CBOR format.
Another script generates ledger snapshots for several recent epochs using our implementation and compares them to Cardano Node’s using a CBOR diff algorithm.
If differences are found, a binary search determines the first diverging epoch.
All structures must match byte-for-byte, except for non-myopic pool likelihoods (float32), where equality is checked up to four significant digits.

Benefits of This Approach

Tests against complete mainnet data (millions of blocks), ensuring feature compatibility.
Provides confidence in performance comparisons.
Quickly adapts to new ledger behavior (e.g., Conway voting).
Easily extendable to Cardano testnet data.

Downsides to Address with the Proposed Test Suite

Does not test negative cases (blocks rejected by the ledger).
Lacks edge case testing (valid per spec but never seen on mainnet).
Does not assess adversarial behavior (potential attack scenarios).

Data format

Complete blockchain blocks as inputs

Our mental model aligns with Amaru’s simulation testing approach. Since Cardano network protocol messages affecting the ledger use blocks, it would be more practical if the conformance test suite were based on blockchain blocks as input. This approach enables simulation in both networked and standalone environments.

Thus, an input format consisting of a file containing a sequence of 0 or more fully formed Cardano blocks is proposed. A real-world example of a valid input file is a chunk file from Cardano Node’s ImmutableDB or VolatileDB.

To illustrate why blocks are preferable to transactions (as suggested by @ch1bo), consider a block with zero transactions:

If such blocks must be discarded, we need a way to explicitly test this behavior.
If such blocks must be accepted, the ledger state still changes because some components are influenced by block structure rather than transactions.
- For example, the latest slot affects pulsing computations such as rewards and voting.
- Another example is pool block counters, which impact pool performance, reward calculations, and, therfore, consensus. These behaviors also need explicit testing.

To conclude, in my opinion, using sequences of 0+ blocks allows to model all rules, while keeping the inputs sufficiently small for quick run times.

Complete ledger state as outputs

Through trial and error in teaching the C++ implementation to produce binary-compatible state snapshots with Cardano Node, we’ve learned that almost every component of the ledger state eventually impacts stake distribution or distributed rewards. In turn, this influences consensus.

The only purely informational component we are aware of is non-myopic pool likelihoods. However, since they are relatively small, making an exception for them seems unnecessary.

If needed, we can review the ledger state component by component and present cases where consensus is affected.

JSON vs CBOR

JSON has advantages, including human readability, ease of writing, and readable diffs, which improve developer productivity. However, our experience with test execution, ledger state generation, diff analysis, and test case preparation led us to replace JSON with CBOR. Here’s why:

File Size & Readability Limitations
- A CBOR ledger state snapshot for a recent mainnet epoch is ~2.5 GB.
- When converted to JSON, it exceeds 10 GB.
- Editors struggle with files this large, making them effectively unreadable.
Performance & Hardware Constraints
- Engineers often need to quickly generate a reference snapshot from a sequence of blocks.
- Cardano Node requires significantly more time and RAM (crashed in a VM with 48GB RAM!) to generate full JSON snapshots of most recent epochs.
- This makes it impossible to run on a moderate laptop, reducing convenience and increasing execution time.
Faster Diff Analysis & Tooling Solutions
- When analyzing snapshots, programmatic diff analysis is unavoidable.
- Since CBOR files are smaller, they are faster to process, improving developer efficiency.
- Although CBOR is less readable, a simple CBOR diff script can:
  - Print diffs in human-readable format.
  - Convert sequences of positional indices (e.g., #0.4.3.2.1.11) into descriptive names.

Despite CBOR’s poor readability, its smaller size, faster processing time, supported by minimal tooling make it far more efficient for this workflow. In practice, this results in higher developer productivity.

A Complete Test Case Data Set

A complete test case naturally includes the previous state snapshot and genesis configuration. Thus, a full data set for a single test case consists of:

Input — A file with a sequence of CBOR-encoded blocks.
Input — A set of genesis configuration files used by Cardano Node.
Input — A CBOR-encoded initial ledger state snapshot (or none, if not applicable).
Output — The expected output data is just a single A CBOR-encoded initial ledger state snapshot.

Benefits of This Format

Initial test cases can be created by copying data from a Cardano Node instance.
Ensures coverage of any functionality that affects the ledger state and could lead to non-conformant behavior.
The data is suitable for further simulation testing, such as the networking tests described by @KtorZ.

Alignment Process

To take a small step forward, I’ve identified key design decisions where alternative suggestions have been proposed, along with their respective proponents. Would it be reasonable to ask each proponent to prepare a minimal working example for their preferred approach?

With concrete examples available, it should be easier and faster to evaluate and align on a final version. Let me know if that works for you.

Design Decisions

Inputs: Blocks (@sierkov) vs. Transactions (@ch1bo)
Outputs: Full ledger snapshot (@sierkov) vs. Partial ledger snapshots (@KtorZ)
Encoding: CBOR (@sierkov) vs. JSON (@KtorZ) for the ledger snapshot

Also, I’d like to list some areas for exploration that seem particularly valuable with regard to the incremental value of this new conformance test suite. Please, let me know if I’ve missed something.

Areas for Exploration:

Generate a test case for the simplest possible edge case that does not occur on the mainnet.
Generate a test case for the simplest possible attack scenario.
Generate a minimal negative case—blocks rejected under current ledger rules.
Collect examples where the formal specification is incomplete compared to the reference implementation. (@WhatisRT, did I capture this correctly?)
Gather proposals for generating a comprehensive set of traces from the formal specification.

@WhatisRT, in my view, one of the most important questions in this discusison is the complexity of generating test cases from the formal specification. Given your experience in this area, could you describe the necessary steps to programmatically create a simple test case of your choice from the Agda formalization? I have practical ideas on how to generate traces, but before considering alternative approaches, I’d like to understand the feasibility of programmatic generation from the specification, which I see as the most comprehensive route.

Additional topics

On Implementation-Specific Drivers

@ch1bo, in my view, the test suite should provide only the data, while the responsibility for building a test driver should lie with each implementation. Since different implementations may have their own objectives and toolsets, it makes sense for them to develop their own drivers.

However, the data format should be designed to allow any implementation to develop an initial test driver quickly (within about a week of development time?). To validate this, a reference driver for Cardano Node could be included. Moreover, benchmarking against the reference implementation may be the most widely shared interest across all implementations, despite their differences.

A good example is the UPLC file format from the Plutus Conformance tests—it is implementation-agnostic and simple enough that an initial UPLC encoder and decoder can be developed within a week, enabling teams to proceed with testing.

Feb 20 '25 01:02 sierkov

Overall, this sounds quite reasonable to me. Some points I'd like to add:

A ledger implementation that wants to participate in consensus will have to be able to validate transactions and step across epoch boundaries independently from each other. This is why I suggested adding an interface for TICK, which gives you what you need (if you can already validate transactions). In fact, some of the logic of validating blocks does not happen in the ledger, but as part of the consensus code - we actually do have a separate consensus spec that implements this logic and just now became executable. It's further away from being used for testing (but it's also a lot smaller, so it should be less work to get it ready for testing), so if you want to validate entire blocks I'd suggest using that instead.
The spec doesn't provide generators, it just provides a reference implementation. The ledger team worked quite hard on the problem of generating good tests and lots of different approaches have been tried in the past. There is now a quite good library for generating test cases based on constraints - I'd suggest to use this to not duplicate years of engineering effort. I'm a bit out of the loop on this, but if the ledger team is interested in supporting this use case for the library I'm sure @lehins can point you in the right direction.
The constraint based generators library should also in principle be able to provide negative tests, which is quite useful for conformance testing. We talked about doing that a while ago, but I don't know if that's something that was ever implemented.
In the JSON vs CBOR question, it seems quite obvious to use CBOR to me. It's what is actually being used and it has much better performance characteristics. If you manually want to read the data, you can always convert it to JSON and then you get the best of both.

Feb 20 '25 13:02 WhatisRT

This is a noble goal. However, it is one that requires an enormous amount of work. I can tell this for a fact from experience of trying to implement conformance testing of Ledger implementation against Ledger specification, which is a much smaller task than what is being asked for here. So, I hate to say it, but until we have the budget approved for such test suite that can be used for testing alternative implementations and until we have enough people to work on a massive project like that, the Ledger team can't really participate in it.

@ch1bo My opinion on this is that I don't have enough bandwidth to even worry about working a testing framework for alternative implementations. So, for now, this will have to be a volunteer driven effort that does not involve the Ledger team. I'll keep this ticket open, since there is a chance that we might get pulled in at some point into this effort, but until then we have to focus on work items that we have explicit approval for.

That was my administrative opinion. My personal opinion is that such a testing (or certification project as Charles mentioned it in his AMAs) should be driven by a totally separate team. I strongly believe that Ledger team should not be directly involved in this, since it will more than likely be a totally separate beast. Once a team like that is formed we would be happy to provide our guidance and share our experience we acquired from implementing conformance testing. Constraint generation framework and conformance test suite could potentially be even repurposed to become a more general testing tool like the one desired in this ticket. But again, this is not going to be a responsibility of the Ledger team, until we have appropriate resources and explicit approval, if ever.

Feb 21 '25 05:02 lehins

@lehins, thank you for being direct about the resourcing situation on your end.

This task does come with associated resources, though they are not explicitly outlined yet since the scope is still being defined. However, all three implementations already have to allocate resources for conformance testing. Additionally, as @ch1bo shared, the Cardano-Blueprint Initiative has expressed resource-backed interest in this effort.

A well-designed conformance test suite could meet the needs of all implementations, making shared contributions more practical than separate efforts. This is why the first stage of the task focuses on aligning scope and approach—its outcome will directly influence the resources available.

Why this issue is created in this repository

You manage a reference implementation and have deep expertise in this area, making your feedback especially valuable.
Some components of the test suite could provide incremental value to the reference implementation. For example, @WhatisRT pointed out that negative cases might be an area worth strengthening.
This repository is the primary reference point for alternative implementations. Keeping this task here increases visibility and may help attract additional resources for future implementation.

Request for Feedback

Would you be open to sharing your thoughts on the following, without making any resource commitments?

How practical do you find the proposed data format?
Do you foresee any challenges in structuring the test suite as a pre-generated, implementation-independent dataset rather than implementation-specific code?
Could any of the discussed testing areas provide additional value to the reference implementation? For example, more comprehensive tests for negative cases or adversarial behavior?

Feb 22 '25 04:02 sierkov

@KtorZ, have you had a chance to discuss the approach to testing ledger conformance with your team? My project has a draft implementation of the ledger supporting Conway governance actions, and active work on conformance testing is likely to start in the coming weeks. If there's a preferred approach, I'd appreciate feedback while there is time to adjust plans.

Mar 24 '25 02:03 sierkov

Not yet; I voluntarily postponed the discussion to beginning of April, where we have a wider workshop regarding node diversity and ensuring conformance between node implementations. Seems like a right moment to have these conversations.

So far, we've been pursuing with our JSON snapshots and took some time to package those a bit more independently: https://github.com/pragma-org/amaru/tree/main/conformance-tests. They aren't definitive, and producing CBOR snapshots instead would be straightforward. It is true that so far we've only been focusing on PreProd, and I anticipate that producing similar JSON snapshots for mainnet will be challenging. But, that's a problem for another day :)

Also, these "only" covers the aggregated ledger state; but we're looking into something that could cover ledger rules as well. A set of meaningful transactions that capture interesting scenarios would be ideal. So far, we haven't spent much efforts on that. I believe this is where collaboration will be interesting. As you rightfully pointed out, just looking at blocks on mainnet doesn't necessarily cover edge-cases, and it has also a strong survivor biais. We need a lot more negative-cases to ensure proper conformance otherwise we risk one implementation being more flexible than another.

Mar 25 '25 12:03 KtorZ

@KtorZ, thank you. I take note that you're primarily interested in negative and edge scenarios, ideally presented as a set of transactions. Would it be a problem if such sets of transactions were wrapped into blocks, as described above? Extracting transactions from a block is trivial, so it seems that using blocks should work for everyone. What do you think?

Mar 31 '25 11:03 sierkov

Yup. Although I still see values in scenarios that are spreading over multiple blocks too; and especially, across epochs. There are sufficiently many things happening at epoch boundaries!

Mar 31 '25 12:03 KtorZ

@KtorZ, I fully agree. The ideal input is a sequence of zero or more blocks, like a volatileDB chunk file, but without guarantees on block order. This allows us to simulate out-of-order arrival and other edge cases. Does that work for you?

Apr 01 '25 10:04 sierkov

@KtorZ @Quantumplation You mentioned in previous discussions about this that ethan is working on this. Any results to share with us here already?

May 01 '25 07:05 ch1bo

Yes! He's posted an update about it in the Amaru channel, but I'll tag him ( @rrruko ) so he can provide an update here as well.

May 01 '25 11:05 Quantumplation

@ch1bo @Quantumplation @KtorZ

I'm generating test vectors by persisting the cbor tx, old ledger state, and new ledger state for each tx submitted by the conway ImpTest suite in cardano-ledger. Two candidates that I've identified for removal from the test vector set are the BBODY spec (consisting of a single test case that produces around 400 vectors) and the v9 conway spec (the conway tests are run twice, in both v9 and v10).

The ImpTest suite by itself has 73% expression coverage on the cardano-ledger-conway package (25496/34522). After removing the BBODY spec, the coverage goes to 73% 25305/34359. After removing both the BBODY spec and the v9 spec, the coverage becomes 72% 24784/34357. This cuts the amount of vectors roughly in half with a pretty small impact on coverage.

Using default json encodings, the file size of the test vector dump is kind of big (121M). I think most of this is from serializing the protocol params multiple times per LedgerState record. One option to cut down on this is to serialize the pparams records by hash and include the preimage of each hash in the dump. I haven't fully implemented that yet, but as an experiment, I changed the pparam encoding to null, resulting in a total size of 46M, which gives an idea of the potential savings.

I have local unpushed local changes that I will share soon.

May 03 '25 02:05 rrruko

@rrruko perhaps we should consider CBOR as an encoding here? Given that I assume test vectors are mostly a mix of ledger-state and transactions?

May 03 '25 08:05 KtorZ

I switched to CBOR encoding for the ledger states and implemented hashing of the pparams records. This brings the total size down to 41M. There is another really big, low-coverage test (TxRefScriptsSizeTooBig, which just produces a single tx that spends many big reference scripts to exceed a size limit) that takes up 13M, which if removed brings the file size of the dump to 28M and brings the coverage to 71% (24162/33901). After that there are no really huge outliers left, though there is still a lot of variance in the file sizes. It's not really clear to me what to filter out at this point. If we excluded the top 50% tests by file size, we'd be left with tests with file sizes of 24K or less, for a total of about 2M. Or maybe there is something else we could do to improve the file sizes further?

Here are some of the biggest files in the dump to give an idea:

...
320K    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.UTXOS.PlutusV3 Initialization.Updating CostModels and setting the govPolicy afterwards succeeds
324K    eras/conway/impl/dump/pparams-by-hash
332K    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.RATIFY.CommitteeMinSize affects in-flight proposals.TreasuryWithdrawal ratifies due to a decrease in CommitteeMinSize
344K    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.RATIFY.Voting.Active voting stake.Predefined DReps.AlwaysNoConfidence
412K    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.RATIFY.Delaying actions.A delaying action delays all other actions even when all of them may be ratified in the same epoch
424K    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.RATIFY.ParameterChange affects existing proposals.SPO.Increasing the threshold prevents a hitherto-ratifiable proposal from being ratified
444K    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.RATIFY.ParameterChange affects existing proposals.DRep.Decreasing the threshold ratifies a hitherto-unratifiable proposal
460K    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.RATIFY.ParameterChange affects existing proposals.SPO.Decreasing the threshold ratifies a hitherto-unratifiable proposal
496K    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.RATIFY.Delaying actions.An action expires when delayed enough even after being ratified.Other lineage
628K    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.GOV.Proposals.Consistency.Subtrees are pruned for both enactment and expiry over multiple rounds
880K    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.GOV.Proposals.Consistency.Subtrees are pruned when competing proposals are enacted over multiple rounds
1.1M    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.GOV.Proposals.Consistency.Subtrees are pruned when proposals expire over multiple rounds
1.2M    eras/conway/impl/dump/Conway.Imp.ConwayImpSpec - Version 10.ENACT.Treasury withdrawals.Withdrawals exceeding treasury submitted in several proposals within the same epoch
28M     eras/conway/impl/dump

And here is what one of these vectors looks like:

{"cbor":"84a300d901028182582003170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c111314000184a300581d60643b751fd2d910c4fc6755c1701314cf97aa7310c3542cb2f8c825fc011a0078e07403d818590661820159065c59065901000032323232323232323232323322222353232325333573466e1d200000213232332212330010030023232325333573466e1d200000213232323232323232332333333232333233222222222212333333333300100b00a0090080070060050040030023300423232325333573466e1d2000002133221233001003002301435742002600a6ae84d5d1000898132481035054310035573c0046aae74004dd50009aba100d33004001357420184646464a666ae68cdc3a40000042646466644424666002008006004646464a666ae68cdc3a4000004266442466002006004604e6ae84004cc010098d5d09aba20011302a4901035054310035573c0046aae74004dd51aba100330073574200466002eb8d5d09aba2002223232325333573466e1d2002002112200115333573466e1d2000002132122300200330053574200226056921035054310035573c0046aae74004dd50009aba200113025491035054310035573c0046aae74004dd500098009aba100a30013574201260026002eb48c88c008dd58009811111999aab9f001201a23233501a3301e375c6aae74004c014d55cf00098021aba200335742004046660020346ae84018cc004cc06006dd69aba10053232325333573466e1d20000021332212330010030023232325333573466e1d20000021332212330010030023302475a6ae84004c08cd5d09aba200113025491035054310035573c0046aae74004dd51aba10013232325333573466e1d20000021332212330010030023302475a6ae84004c08cd5d09aba2001130254901035054310035573c0046aae74004dd51aba13574400226044921035054310035573c0046aae74004dd51aba10043300175c6ae8400ccc004cc061d710009aba10022322300237580026042446666aae7c00480648cd4060c010d5d080118019aba2002022300d357426ae88004d5d10009aba2001357440026ae88004d5d10009aba2001357440026ae880044c059241035054310035573c0046aae74004dd51aba10033232325333573466e1d2000002132122223003005375c6ae8400454ccd5cd19b87480080084c848888c004014c010d5d08008a999ab9a3370e900200109909111180100298091aba100115333573466e1d20060021321222230040053011357420022602c9201035054310035573c0046aae74004dd51aba1357440064646464a666ae68cdc3a4000004266442466002006004600a6ae84004dd69aba1357440022602c9201035054310035573c0046aae74004dd500091919192999ab9a3370e900000109bae357420022602a921035054310035573c0046aae74004dd500089808a49035054310035573c0046aae74004dd5000911999a80092804928049280492999ab9a3375e00a00c20242c244004244244660020080062646a002002c444646464a666ae68cdc3a400000426600e600c6ae84004c014d5d09aba20011300d4901035054310035573c0046aae74004dd500091091980080180111919192999ab9a3370e900000109909111111180280418029aba100115333573466e1d20020021321222222230070083005357420022a666ae68cdc3a400800426644244444446600c012010600a6ae84004dd71aba1357440022a666ae68cdc3a400c0042664424444444660040120106eb8d5d08009bae357426ae8800454ccd5cd19b87480200084cc8848888888cc004024020dd71aba1001375a6ae84d5d10008a999ab9a3370e90050010891111110020a999ab9a3370e900600108911111100189804a481035054310035573c0046aae74004dd500091919192999ab9a3370e9000001099091180100198029aba100115333573466e1d2002002132333222122333001005004003375a6ae84008dd69aba1001375a6ae84d5d10009aba2001130084901035054310035573c0046aae74004dd500091919192999ab9a3370e900000109909118010019bae357420022a666ae68cdc3a400400426424460020066eb8d5d080089803a481035054310035573c0046aae74004dd500091919192999ab9a3370e900000108910008a999ab9a3370e9001001089100109803249035054310035573c0046aae74004dd5000911919192999ab9a3370e9000001089110010a999ab9a3370e90010010990911180180218029aba100115333573466e1d20040021122200113006491035054310035573c0046aae74004dd5000919319ab9c001002120012323001001230022330020020010183581d706ba8f502e9f994ed5518d3007f705452969a10b01901259a114140561a000f311a5820e88bd757ad5b9bedf372d8d3f0cf6c962a469db61a265f6418e1ffed86da29ec83581d706ba8f502e9f994ed5518d3007f705452969a10b01901259a114140561a000f311a5820e88bd757ad5b9bedf372d8d3f0cf6c962a469db61a265f6418e1ffed86da29ec825839005c6d8be1305b1bb46470f9648d628fb2be46037412e382fd87638f79abc051f9bd244aa619dbb85d0d8714817fca48b53971a6b9c90a8b801b009fdf42f6491e7d021a00041edba100d90102818258204000cb1414760fd0995428ee2f4552c1d98d39c63a0a5bf087b5c89ccbd4bcf95840d12bfdf411b4e789e40b2e9956813a05476325f888898e71ddc1a77711466cf8750f1916de71b6941f00d81338f7a38e7f14d9f480c9a65f218d0a01b058bf06f5f6","newLedgerState":"828383a0a00084a0a0a0a08482a0a0a0a084a0a0000086a45822c276414128da4e540279b8e1f40ac6ec02a3639d1586a67addd8749516d838790000590685051d60643b751fd2d910c4fc6755c1701314cf97aa7310c3542cb2f8c825fc0083e3c0740001008c5c59065901000032323232323232323232323322222353232325333573466e1d200000213232332212330010030023232325333573466e1d200000213232323232323232332333333232333233222222222212333333333300100b00a0090080070060050040030023300423232325333573466e1d2000002133221233001003002301435742002600a6ae84d5d1000898132481035054310035573c0046aae74004dd50009aba100d33004001357420184646464a666ae68cdc3a40000042646466644424666002008006004646464a666ae68cdc3a4000004266442466002006004604e6ae84004cc010098d5d09aba20011302a4901035054310035573c0046aae74004dd51aba100330073574200466002eb8d5d09aba2002223232325333573466e1d2002002112200115333573466e1d2000002132122300200330053574200226056921035054310035573c0046aae74004dd50009aba200113025491035054310035573c0046aae74004dd500098009aba100a30013574201260026002eb48c88c008dd58009811111999aab9f001201a23233501a3301e375c6aae74004c014d55cf00098021aba200335742004046660020346ae84018cc004cc06006dd69aba10053232325333573466e1d20000021332212330010030023232325333573466e1d20000021332212330010030023302475a6ae84004c08cd5d09aba200113025491035054310035573c0046aae74004dd51aba10013232325333573466e1d20000021332212330010030023302475a6ae84004c08cd5d09aba2001130254901035054310035573c0046aae74004dd51aba13574400226044921035054310035573c0046aae74004dd51aba10043300175c6ae8400ccc004cc061d710009aba10022322300237580026042446666aae7c00480648cd4060c010d5d080118019aba2002022300d357426ae88004d5d10009aba2001357440026ae88004d5d10009aba2001357440026ae880044c059241035054310035573c0046aae74004dd51aba10033232325333573466e1d2000002132122223003005375c6ae8400454ccd5cd19b87480080084c848888c004014c010d5d08008a999ab9a3370e900200109909111180100298091aba100115333573466e1d20060021321222230040053011357420022602c9201035054310035573c0046aae74004dd51aba1357440064646464a666ae68cdc3a4000004266442466002006004600a6ae84004dd69aba1357440022602c9201035054310035573c0046aae74004dd500091919192999ab9a3370e900000109bae357420022602a921035054310035573c0046aae74004dd500089808a49035054310035573c0046aae74004dd5000911999a80092804928049280492999ab9a3375e00a00c20242c244004244244660020080062646a002002c444646464a666ae68cdc3a400000426600e600c6ae84004c014d5d09aba20011300d4901035054310035573c0046aae74004dd500091091980080180111919192999ab9a3370e900000109909111111180280418029aba100115333573466e1d20020021321222222230070083005357420022a666ae68cdc3a400800426644244444446600c012010600a6ae84004dd71aba1357440022a666ae68cdc3a400c0042664424444444660040120106eb8d5d08009bae357426ae8800454ccd5cd19b87480200084cc8848888888cc004024020dd71aba1001375a6ae84d5d10008a999ab9a3370e90050010891111110020a999ab9a3370e900600108911111100189804a481035054310035573c0046aae74004dd500091919192999ab9a3370e9000001099091180100198029aba100115333573466e1d2002002132333222122333001005004003375a6ae84008dd69aba1001375a6ae84d5d10009aba2001130084901035054310035573c0046aae74004dd500091919192999ab9a3370e900000109909118010019bae357420022a666ae68cdc3a400400426424460020066eb8d5d080089803a481035054310035573c0046aae74004dd500091919192999ab9a3370e900000108910008a999ab9a3370e9001001089100109803249035054310035573c0046aae74004dd5000911919192999ab9a3370e9000001089110010a999ab9a3370e90010010990911180180218029aba100115333573466e1d20040021122200113006491035054310035573c0046aae74004dd5000919319ab9c00100212001232300100123002233002002001015822c276414128da4e540279b8e1f40ac6ec02a3639d1586a67addd8749516d8387901005843011d706ba8f502e9f994ed5518d3007f705452969a10b01901259a1141405600bce21ae88bd757ad5b9bedf372d8d3f0cf6c962a469db61a265f6418e1ffed86da29ec5822c276414128da4e540279b8e1f40ac6ec02a3639d1586a67addd8749516d8387902005843011d706ba8f502e9f994ed5518d3007f705452969a10b01901259a1141405600bce21ae88bd757ad5b9bedf372d8d3f0cf6c962a469db61a265f6418e1ffed86da29ec5822c276414128da4e540279b8e1f40ac6ec02a3639d1586a67addd8749516d83879030058470201abc051f9bd244aa619dbb85d0d8714817fca48b53971a6b9c90a8b80b41b5b30e18b6d5cb28f628d64f97064fd82e312740346be01000000798f638700cff7e8afb2a4bc7d001a00041edb87828480808080808182a28200581c204c5f1bafe8ee289c881cb48d5f1b2abf97e8720d18526561d93be71903938200581c65d36c561e076d2a6fe08172619d48bdcfd9056e4ee79cbf2cfcbccb190393d81e8201018282782368747470733a2f2f63617264616e6f2d636f6e737469747574696f6e2e63727970746f5820e5f5f212e67762d5fc4727a8bf9cdfaaf9dc0ca8e0f5cf51f32a214a5242cd44581cfa24fb305126805cf2164c161d852a0e7330cf988f1fe558cf7d4a645820ff6f3ba8c8cf05f24502f0effaf260dfca6175a04beaa33c12949f64929338625820c4359a5ed626b4b0693cad1d3330c3b5f55a55c0199d725519f3d051ce2827998100828480a0a0a084878182a28200581c204c5f1bafe8ee289c881cb48d5f1b2abf97e8720d18526561d93be71903938200581c65d36c561e076d2a6fe08172619d48bdcfd9056e4ee79cbf2cfcbccb190393d81e8201018282782368747470733a2f2f63617264616e6f2d636f6e737469747574696f6e2e63727970746f5820e5f5f212e67762d5fc4727a8bf9cdfaaf9dc0ca8e0f5cf51f32a214a5242cd44581cfa24fb305126805cf2164c161d852a0e7330cf988f1fe558cf7d4a645820fbdc3a32a537a9cdd07804bb4a903ecba465325a22bbb97e49527e4e2abc8f8c5820c4359a5ed626b4b0693cad1d3330c3b5f55a55c0199d725519f3d051ce28279900a0848080808080d9010280f4a18200581cabc051f9bd244aa619dbb85d0d8714817fca48b53971a6b9c90a8b801b009fdf42f6491e7d00","oldLedgerState":"828383a0a00084a0a0a0a08482a0a0a0a084a0a0000086a1582203170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400005828001d6088028438394946279f9ed8d66d718679b82f26b75391cb6df8107c8f00cff7e8afb7928000000087828480808080808182a28200581c204c5f1bafe8ee289c881cb48d5f1b2abf97e8720d18526561d93be71903938200581c65d36c561e076d2a6fe08172619d48bdcfd9056e4ee79cbf2cfcbccb190393d81e8201018282782368747470733a2f2f63617264616e6f2d636f6e737469747574696f6e2e63727970746f5820e5f5f212e67762d5fc4727a8bf9cdfaaf9dc0ca8e0f5cf51f32a214a5242cd44581cfa24fb305126805cf2164c161d852a0e7330cf988f1fe558cf7d4a645820ff6f3ba8c8cf05f24502f0effaf260dfca6175a04beaa33c12949f64929338625820c4359a5ed626b4b0693cad1d3330c3b5f55a55c0199d725519f3d051ce2827998100828480a0a0a084878182a28200581c204c5f1bafe8ee289c881cb48d5f1b2abf97e8720d18526561d93be71903938200581c65d36c561e076d2a6fe08172619d48bdcfd9056e4ee79cbf2cfcbccb190393d81e8201018282782368747470733a2f2f63617264616e6f2d636f6e737469747574696f6e2e63727970746f5820e5f5f212e67762d5fc4727a8bf9cdfaaf9dc0ca8e0f5cf51f32a214a5242cd44581cfa24fb305126805cf2164c161d852a0e7330cf988f1fe558cf7d4a645820fbdc3a32a537a9cdd07804bb4a903ecba465325a22bbb97e49527e4e2abc8f8c5820c4359a5ed626b4b0693cad1d3330c3b5f55a55c0199d725519f3d051ce28279900a0848080808080d9010280f4a000","success":true,"testState":"Conway.Imp.ConwayImpSpec - Version 10.UTXOS.can use reference scripts"}

Note that the majority of the size of the "newLedgerState" in this vector comes from inline reference script on one of the utxos, but (as far as I can tell) reference scripts do not account for the majority of the test vector sizes in general. For instance, "Withdrawals exceeding treasury submitted in several proposals within the same epoch" contains 67 test vectors, averaging about 20K per vector.

Code: https://github.com/rrruko/cardano-ledger/commit/10d908717d8e101d32d4f14e711e2b0a0fcb0b26

May 14 '25 14:05 rrruko

I think at this point lets enable Git LFS on the Cardano Blueprint repo and get a first iteration committed. @ch1bo

Longer term, if size becomes an issue, one approach to larger tests would be to have a small generator program for producing them on demand.

Alternatively, what happens if you gzip the directory first?

May 14 '25 14:05 Quantumplation

ah, the tarball is much better: 1.7M

May 14 '25 14:05 rrruko

Nice; that maybe buys us room to add back in some of the other tests

May 14 '25 15:05 Quantumplation

cardano-ledger cardano-ledger copied to clipboard

Implementation-Independent Ledger Conformance Test Suite

Motivation

Requirements

Assumptions Driving the Proposed Approach

Proposed approach

I - Simulation testing

II - Ledger snapshots

III - Test vectors

Project context

Data format

Complete blockchain blocks as inputs

Complete ledger state as outputs

JSON vs CBOR

A Complete Test Case Data Set

Alignment Process

Additional topics

On Implementation-Specific Drivers

Why this issue is created in this repository

Request for Feedback

cardano-ledger
cardano-ledger copied to clipboard