cargo-semver-checks icon indicating copy to clipboard operation
cargo-semver-checks copied to clipboard

Check if using `simd-json` to parse rustdoc produces any speedup

Open obi1kenobi opened this issue 6 months ago • 17 comments

Many of the rustdoc JSON files we parse are large — from a few MB to ~500MB in size. In the largest cases, we spend ~5s parsing JSON per cargo-semver-checks run.

Speeding up JSON parsing by switching from serde_json to simd-json might be an easy win. Let's check!

  • [ ] rewrite the JSON loading code to use simd-json
  • [ ] benchmark (on your own machine) the perf difference of loading rustdoc JSON of a large crate like aws-sdk-ec2
    • to generate rustdoc JSON, clone that repo, cd into the crate's directory, and use the RUSTDOCFLAGS="-Z unstable-options --document-private-items --document-hidden-items --output-format=json --cap-lints=allow" cargo doc --lib --no-deps shell command which will put aws-sdk-ec2.json into the target/doc directory of the workspace
    • consider using tools like divan or criterion to ensure the perf delta is real and not a measurement artifact
  • [ ] smaller rustdoc JSON file loading shouldn't regress either — regular cargo test loads dozens of small JSON files, so you can check those
  • [ ] bonus: as a separate PR, produce a benchmark of sufficient quality that we could include it in the repo itself, so we can reuse it for future optimizations

This is a good first issue for someone who is already familiar with Rust and wants to start contributing to this project. It might not be a good first issue for folks new to Rust.

obi1kenobi avatar Aug 25 '24 15:08 obi1kenobi