benchee Option to track statistics via an accumulator only

Currently, Benchee stores all samples while benchmarking. This can cause some problems with memory usage and reporting, like those mentioned in #326 and bencheeorg/benchee_html#3.

Could we introduce a configuration option to instruct Benchee to store only metrics that can be calculated using accumulators? That way, instead of storing each sample, Benchee can store the accumulator and perform the final calculation at the end. This could reduce memory usage (and disk usage when saving previous benchmark runs) significantly.

Some statistics Benchee currently tracks wouldn't be supported under this option, since they can't be calculated via accumulator.

Statistics that would still work:

Average
Iterations per second
Standard Deviation (if variance was calculated thru something like Welford's Online Algorithm)

Statistics that would not work:

Median
99th percentile

Thoughts on this idea?

Jul 02 '25 15:07 weaversam8

👋

Hi there, first of all thanks a lot for opening an issue 💚

I like the idea and can see the idea but while I haven't realeased it yet (should be getting to it...) I do think that max_sample_size should solve most of these issues by setting it to a reasonable amount memory consumption, file size and rendering should all be reasonable. For the graphs in particular, feeding plotly.js the data it needs instead of the raw measurements should also help tremendously.

That is, with the aforementioned measures in place I see a lot less reason to implement an accumulator. And having essentially 2 ways of gathering calculating statistics has the following downsides imo:

2 separate implementations needed to be kept in sync and fixed
as you mentioned a different set of statistics available in formatters, which would be a breaking change as so far the promise is all of these will be available

So my inclination is, let's see how it goes after max_sample_size is released and then go from there. Unless I'm missing a major point of course/someone absolutely needing 100 million samples without memory bloating up :)

Jul 06 '25 11:07 PragTob

I think max_sample_size would definitely be helpful, and I agree with you that it's an easier lift architecturally. Even if it's not released, is it present on main and something I can try for our use case?

That being said, there are some benchmarks that might not fit in the pattern of "collect first N samples then discard subsequent samples." My organization's recent interest in benchmarking stems from a surprising performance problem that we've since identified.

Before we had isolated the source of the problem, our steps to replicate involved running our benchmark for 20-30 minutes, and the performance problem would only appear about halfway thru that process. I could see that being a gap in the max_sample_size solution. It seems like maybe making the list of samples a sort of ring buffer could help, but I'm not sure the most performant way to do that under BEAM since lists are implemented as a singly linked list.

Jul 07 '25 14:07 weaversam8

Just wanted to add a datapoint after some experimentation today. I pulled the latest main and started running my benchmark with the new default of max_sample_size: 1_000_000. Things are definitely faster now, but each benchmark results file when output to JSON is still about 4MB.

In order for this to be useful to me, I've had to manually pipe the Suite thru a function that deletes the samples from each Benchee.CollectionData before saving the suite to disk. That has helped tremendously, and still allows me to see statistics. I'd suggest this be an option when using a formatter (to choose whether to persist the samples or just the statistics).

Jul 07 '25 21:07 weaversam8

👋

Let me do the whole apology here. First, really thanks a lot for your contributions @weaversam8 - I do appreciate them while my silence here doesn't quite show that 💚

When I last wrote here I had been at the funeral of let's say a close family friend a couple of days prior. I had also started a new job that month. At the end of that month one of my closest and best friends died. It's been, some difficult months - so apologies. I'm trying to catch up and get more cadence in again!

For the questions/discussion:

Yes max_sample_size is on main, generally I try to keep the changelog updated. I should really release it one of these days 😅

And aha... that's a very intriguing/interesting use case to keep in mind. To clarify, max_sample_size collects n samples and then stops collecting samples and you can set n. Depending on your benchmark that may or may not be enough (depends on the time each iteration takes). But I see the use case, that's almost more like a system stress test if I may say so 🤔

@weaversam8 why is a file of 4MB size not useful to you? That seems... relatively small? 🤔

The problem with adding it to the interfact of formatters (besides complexity) is the contract of suddenly values being there or not being there. Although I guess benchee_html calls the json formatter itself so it's fine. I'm happy to welcome a PR to add that to benchee_json. Or do you mean the general save feature? That said, it's happily one of benchee's design goals that in these cases you can write your own formatters or deal with the suite data yourself if you wish to - I'm happy that that works :)

Oct 05 '25 18:10 PragTob