consensus-specs Spec-tests format changes [discussion]

Various client v1.1 releases were made, and before we get deep into HF1, I think it may be worth it to upgrade the spec-tests format. This way we can sustainably keep adding more tests, without it filling too much disk space with test states. There are various ways of going about this;

compression: Beacon states, especially mainnet states, contain long lists of data, which can be compressed to greatly reduce size
diffs: Tests focus on edge cases, and these are variants of the same general thing. These slight modifications result in a ton of extra data, while the differences are subtle. Diffs reduce that.
removal of legacy YAML: The legacy YAML was kept around for readability, and everyone moved on to read test data from the SSZ files I believe (except for the small metadata files which just use YAML). The API uses JSON, so the YAML support of datatypes can be dropped (if you have not already). If we drop it, we can provide some alternative to make the SSZ files more readable (in the form of a spec-tests website, or a tool of some kind)
deduplication: Maybe inferior to diffs, deduplication of files as a whole can still potentially help. This would mean test files go into some big DB/directory/collection of some kind, and tests reference these.
zip it: Similar to compression, but naive: the current .tar.gz doesn't allow random access to tests, but does compress it all down to < 300 MB. If test runners could read directly from such format, it might work.
Other? Suggestions welcome!

There is PR #2097 up already (just to give you an idea) to reduce it from 11 GB to 0.5 GB (at the time of writing the PR) with compression and removal of YAML. But I think we can do better, and for that I need your feedback as implementers. Do you have suggestions, specific needs, feedback on the current format, or ideas for compatibility with tests of new features?

Marking this as discussion/RFC, we really need feedback from implementers, tests should not continue to grow to absurd disk space.

Jan 19 '21 13:01 protolambda

Compression and zip it seem redundant, was it intended?

Regarding the zipping format, I think it's important that decompression speed is fast. As tests multiply, CI time is growing and we're already not sure how to handle all the hard forks and phases that will need testing as well to keep a healthy feedback loop on our PRs.

In terms of format we have:

Snappy: everyone supports snappy due to networking requirement. It's very fast for compression/decompression but compression ratio is low. That said all cryptographic data like root hashes, keys and signatures are incompressible so it might not be that big of a blocker
Zip: supported by default on all OS (albeit Mac does strange thing with it), support streaming so we can read directly without full decompression/temporaries
RAR: proprietary
tar.gz: Allows streaming implementation so we can read directly without full decompression/temporaries
Zstd: Likely the fast decompression speed and compression ratio. Now growing in popularity
XZ: very slow to compress, high compression ratio, decent to decompress
Brotli: similar goals as Zstd, less popular, less compression ratio/decompression speed
lzma/7z/lzip: high compression ratio, popular, slow to compress
lzo: similar goals to Snappy, focus on decompression speed
lz4: similar goals to Snappy, focus on decompression speed

I would pick the format between Snappy, tar.gz and Zstd. Quick Google-Fu suggests that all support streaming so we can decompress without intermediate files.

Diffs: If we have compression, this is taken care of. No need to implement that ourselves. It makes it easier to feed one specific file when debugging. Removal of YAML: OK Deduplication: If we have compression this is taken care of as well.

Jan 19 '21 13:01 mratsim

@mratsim

Compression and zip it seem redundant, was it intended?

The difference is compressing individual files, vs. compressing a directory of files

Jan 19 '21 13:01 protolambda

Another important thing to consider: it would be nice if we can avoid causing big test-size increases with every fork. Currently, phase1 repeats all phase0 tests, but with phase1 types, to make sure that all still works. But if we keep adding forks, every fork effectively repeats the whole test collection with subtle changes, doubling the size. I am starting to think that we need more powerful diffing than just compression.

Jan 19 '21 14:01 protolambda

For Prysm, we are quite happy with the format. We read the tar.gz files from each tagged release. As far as I know, we prefer ssz over any yaml definition so removing yaml files would be welcomed.

Jan 19 '21 17:01 prestonvanloon

I can chime in, however @michaelsproul has also had a lot of experience in this area.

I'm happy to see the YAML files go. Just to be clear, I checked the PR and it seems we're dropping YAML where there's already an equivalent SSZ (plus a few extra cases), but we're keeping it for other metadata stuff (spec constants, etc). Can you please confirm @protolambda?

From #2097:

The result is that the test generation outputs go from ~11 GB to ~0.5 GB. Generation time is also faster, since it reduces disk IO by a lot.

Is this reduction primarily from dropping YAML? Also, is 0.5 GB the uncompressed size? (I assume so, my uncompressed specs dir is ~5GB.)

I'm going assume we can achieve 0.5GB uncompressed. My suggestion is that we remove YAML but stick with the status quo of .tar.gz archives instead of snappy.

The current size of the test files is not a concern to me; Github Actions (CI) is fine with it and they fit easily on my computers. If we've dropped that size by 95%, then I think we've made a great gain and there's no outstanding need for a partial compression scheme (e.g. individually compressed files).

Jan 19 '21 21:01 paulhauner

@paulhauner

Can you please confirm @protolambda?

Exactly. We keep it for the configs (totally unrelated here really) and for test meta files (e.g. the root of something, a count of something, etc.)

Is this reduction primarily from dropping YAML? Also, is 0.5 GB the uncompressed size? (I assume so, my uncompressed specs dir is ~5GB.)

This is after dropping YAML, as well as snappy-compression (like in gossip, non-framing) on each individual SSZ file.

If you compress everything as a whole (the tar file) you get some more deduplication: the combined size of .tar.gz files for mainnet/minimal/general dirs is ~230 MB. But no random access to tests.

I'm going assume we can achieve 0.5GB uncompressed

How, did I miss something? If not compressed, how do you get 5GB down to 0.5GB? That 0.5 GB I mentioned was the size required for the Snappy+NoYaml approach.

If you just remove YAML, you go from 6.1 GB to 2 GB. Nice improvement, but not good enough long-term in my opinion.

My suggestion is that we remove YAML but stick with the status quo of .tar.gz archives instead of snappy.

Directly using .tar.gz would mean that to read a single test, you would need to decompress the whole thing, and search for the file in the tar archive (I think there are no offsets, just a concatenation of headers and file bodies afaik, correct me if I'm wrong). Compressing individual files with snappy has the benefit of the client being able to decompress them during test running, while still fetching the exact files to run: random-access to test resources.

The current size of the test files is not a concern to me; Github Actions (CI) is fine with it

With HF1 we repeat all phase 0 tests. So the size (6GB or so) doubles (that 11 GB included some phase1 tests iirc, but mostly phase1 variants of tests), and then we add the HF1 tests. Then the next fork we add more tests again, and repeat phase0 and HF1 tests with new types. I think we should prepare for 3x-5x the current size in tests if we keep the old approach, and have to dump uncompressed files somewhere on disk.

Jan 19 '21 21:01 protolambda

Compressing individual files with snappy has the benefit of the client being able to decompress them during test running, while still fetching the exact files to run: random-access to test resources.

This wouldn't be a benefit in Prysm, it would make tests slower. We unpack the whole tar.gz and all tests read from the same directory.

Edit: I haven't tried it, but I assume it is more work in the test and therefore making it slower.

Jan 19 '21 22:01 prestonvanloon

If you just remove YAML, you go from 6.1 GB to 2 GB. Nice improvement, but not good enough long-term in my opinion.

Oh, yeah. So 0.5gb is still compressed with snappy on individual files.

I think we should prepare for 3x-5x the current size in tests if we keep the old approach, and have to dump uncompressed files somewhere on disk.

Personally, I'm not concerned about a 5x2gb = 10gb test suite. That's not going to break our CI and I can handle 10gb on all of my computers quite easily. Lighthouse is structured so you can opt in/out of running the spec tests; only developers who really want to do spec stuff need to download those tests. It's generally only our core-devs working on consensus logic who need these tests, everyone else just sees them pass on CI.

I also don't really mind if we get a bit of a slow-down for snappy, I doubt it'll be very significant. Updating our CI and testing stuff would be an annoyance, but manageable.

From my perspective, individual snappy compression is solving a problem that I don't have. But perhaps there's other users out there which are more affected by the large size and I'm OK with doing some work on my end to make their lives easier.

Jan 19 '21 23:01 paulhauner

Teku doesn't use the YAML either so getting rid of that is an easy win.

We use the caching functionality in CircleCI to cache the expanded contents of the current .tar.gz (and similarly just expand it once on dev machines). That cache is currently 222MiB according to CircleCI so I'm guessing its compressing them really well already when storing and restoring. But it takes nearly a minute to restore that cache which hurts a fair bit - would be interesting to see what the impact of snappy compression on individual files does to that time. Actually running the tests we can easily parallelise in CI so the cost of decompressing snappy doesn't matter too much there.

Implementing support for snappy decompression of the ssz files is pretty simple for us. It would probably make life better for individual devs so they don't use so much disk space. But like Lighthouse, you don't have to download these tests for Teku - mostly we just let CI run them because its so much faster so I don't see it as a problem. If it does become a problem I'd probably just add an option to only expand out the tests for specific hard forks or possibly specific categories of tests since most of the time when running locally you want to manually run a small set of tests to see why they failed in CI.

Jan 19 '21 23:01 ajsutton

I am closing this issue because it seems stale. Please, do not hesitate to reopen it if this is a mistake

Jun 04 '25 09:06 leolara

consensus-specs consensus-specs copied to clipboard

Spec-tests format changes [discussion]

consensus-specs
consensus-specs copied to clipboard