New analysis format
A new implementation of Zinc's incremental state serialization.
- Full structural serialization (like the existing protobuf format), no shortcuts with sbinary or Java serialization (like the existing text format).
- A single implementation that supports an efficient binary format for production use and a text format for development and debugging.
- Consistent output files: If two compiler runs result in the same internal representation of incremental state (after applying WriteMappers), they produce identical zinc files. This is important for build tools like Bazel where skipping a build entirely when the outputs are identical is much cheaper than having to run Zinc to perform the incremental state analysis.
- Smaller output files than the existing binary format.
- Faster serialization and deserialization than the existing binary format.
- Smaller implementation than either of the existing formats.
- Optional unsorted output that trades consistency and small file sizes for much faster writing.
Benchmark data based on scala-library + reflect + compiler:
| Write time | Read time | File size | |
|---|---|---|---|
| sbt Text | 1002 ms | 791 ms | ~ 7102 kB |
| sbt Binary | 654 ms | 277 ms | ~ 6182 kB |
| ConsistentBinary | 157 ms | 100 ms | 3097 kB |
| ConsistentBinary (unsorted) | 79 ms | ~ 3796 kB |
This PR makes the new format available via the new ConsistentFileAnalysisStore. It does not replace the existing formats (but it should; it's a better choice for almost every use case).
We have been using iterations of this format internally over the last few months for the Bazel build (with our own Zinc-based tooling) of our two main monorepos totaling about 27000 Scala (+ Java/mixed) targets ranging in size from a few LOC to almost 1 million LOC.
The 3 large binary files clearly don't belong into this repo but I don't know what's the right place for them. They are used by the integration tests and the benchmarks. In our internal repo we keep such files in S3 instead of checking them into git.
The 3 large binary files clearly don't belong into this repo but I don't know what's the right place for them. They are used by the integration tests and the benchmarks. In our internal repo we keep such files in S3 instead of checking them into git.
Maybe we can store them via Git LFS?
I have never used Git LFS, so ideally people with prior experience with it can judge if this is indeed a good idea.
I don't understand why the benchmarks are failing in CI. I added the new class to the existing benchmark project so it should be found. I was not able to reproduce the problem locally. The benchmark commands from CI run just fine on my machine.
Oh, it's the "benchmark against develop branch" step that fails. This seems to be the usual undercompilation bug of sbt-jmh. It usually needs an explicit clean when you remove a benchmark. Switching to the develop branch after building from this PR would cause it. This problem should go away after merging, but explicitly cleaning the jmh project would be a better fix.
How about : https://github.com/HebiRobotics/QuickBuffers and https://github.com/google/flatbuffers
For the formats? seems fast too.
bump @eed3si9n, are there any blockers from merging this? We've been using this internally with good success, and expect it would provide a lot of value to the broader community to have this upstreamed for SBT and Mill and other build tools to use
bump @eed3si9n, are there any blockers from merging this? We've been using this internally with good success, and expect it would provide a lot of value to the broader community to have this upstreamed for SBT and Mill and other build tools to use
Overall I am onboard with more stable / hermetic analysis serializer. My main line of concern was where the speedup gain was coming from (ExecutionContext.global) and if it would end up competing for CPU attention in non-Bazel usage. Now that ExecutionContext is a parameter, hopefully we can tweak how much thread we assign to this vs other tasks.
@eed3si9n would you have time to do a 1.10.0-M4 release that includes this PR?
Yea. I'll try to get something out this weekend.
https://github.com/sbt/zinc/releases/tag/v1.10.0-RC1 is on its way to Maven Central.