go-ipld-prime icon indicating copy to clipboard operation
go-ipld-prime copied to clipboard

write a few high-level benchmarks to cover most common code paths for IPLD

Open mvdan opened this issue 4 years ago • 2 comments

We currently have quite a lot of micro-benchmarks that exercise a very specific code path, for example BenchmarkSpec_Marshal_Map3StrInt_CodecNull. This is great for micro-benchmarking changes to those specific areas, but not for high-level benchmarking of all of IPLD.

We could add a few benchmarks to cover the most common ways IPLD is used, while combining many data models or types of schemas at the same time. For example, imagine a schema that uses many types at once, and we have a couple of benchmarks for encoding/decoding with JSON and CBOR. What other settings/dimensions might be good to include here without ending up with 20 benchmarks, @warpfork?

mvdan avatar Aug 28 '20 15:08 mvdan

Misc thoughts, which you can take or leave and prioritize freely:

  • Testing things against cbor vs json vs devnull is pretty good for ferreting out whether the next efforts for improvement should target the codecs or the node implementations. But of course it can also be argued that anything other than devnull (or pick one, any one) is not useful to test as a dimension against all node implementations.
  • It's nice to have at least a few tests that can have a scale parameters, e.g. a map with $N members, and test it at {1,2,4,8,,,} so this can be graphed.
  • I actually do like to have N=1 (or N=3; some small number) for some of these things, so it's possible to look at e.g. especially the allocs-per-op number and manually count out what we expect, and then get appropriately mad if the integer isn't exactly what we expect.
  • Benchmarks that are going to also be able to work on schema nodes that have struct types (or union types, or etc -- anything that has specific data shapes) have to have fixed N and have some, well, known structure (e.g. for struct types, map keys have to match struct field names). These benchmarks should also be aimed at non-schema node implementations like basicnode so that we can compare performance of typed and untyped stuff.
  • On top of all this we want some fixtures that cover a wide "representative" (there's no such thing, but let's say!) data sample.
  • There are a few special shapes that can be interesting to look at: say, a map full of lists of strings (test if maps cache/reusemem for listassemblers), and other nestings like this.

I think you can try to cram as many of these as you like into one "(we'll-call-it-)representative" benchmark fixture -- and it's neat and useful to have one big composite thing that we can point at for a holistic number -- but all these other things are also useful, and it's very unlikely that we'll end up with less than 20 benchmarks IMO :)

warpfork avatar Aug 28 '20 23:08 warpfork

I found a couple of interesting benchmark fixtures around the internet today that might be valid to reapply here (although all of these are json, so, eh):

  • https://github.com/chadaustin/sajson/tree/master/testdata has some large corpuses of various interesting real-world examples with various traits. (https://chadaustin.me/2017/06/json-never-dies-an-efficient-queryable-binary-encoding/ is a blog post which talks about them and their traits a bit more, and also has some charts created by tests using them.)
  • https://github.com/alecthomas/go_serialization_benchmarks contains a ton of benchmarks that might also be interesting to use as a standard. It aims at many different libraries.

warpfork avatar Aug 31 '20 23:08 warpfork