quickbench Draft: feat: --max-bytes-used (#2)

When supplying the flag --max-bytes-used, quickbench now also reports the max residency of the program under test. Works for haskell programs only, and they need to be compiled with -rtsopts.

Example output:

+----------------------------------------------++---------------------------++-----------------------------+
|                                              ||     Time (s)              || Max bytes used              |
+==============================================++===========================++=============================+
|                                              || hledger-1.40 hledger-1.41 ||   hledger-1.40 hledger-1.41 |
+==============================================++===========================++=============================+
| -f examples/1ktxns-1kaccts.journal balance   ||         0.21         0.21 ||          3.84M        4.28M |
| -f examples/2ktxns-1kaccts.journal balance   ||         0.35         0.33 ||          7.43M        6.39M |
| -f examples/3ktxns-1kaccts.journal balance   ||         0.49         0.49 ||         10.48M       11.59M |
| -f examples/4ktxns-1kaccts.journal balance   ||         0.53         0.36 ||         14.61M       11.49M |
| -f examples/5ktxns-1kaccts.journal balance   ||         0.47         0.41 ||         18.32M       15.52M |
| -f examples/6ktxns-1kaccts.journal balance   ||         0.47         0.49 ||         21.72M       21.60M |
| -f examples/7ktxns-1kaccts.journal balance   ||         0.55         0.57 ||         22.35M       25.17M |
| -f examples/8ktxns-1kaccts.journal balance   ||         0.61         0.64 ||         22.28M       24.02M |
| -f examples/9ktxns-1kaccts.journal balance   ||         0.70         0.67 ||         31.41M       24.03M |
| -f examples/10ktxns-1kaccts.journal balance  ||         0.77         0.78 ||         36.00M       35.56M |
| -f examples/20ktxns-1kaccts.journal balance  ||         1.52         1.55 ||         72.62M       72.62M |
| -f examples/30ktxns-1kaccts.journal balance  ||         2.19         2.29 ||         85.87M       96.99M |
| -f examples/40ktxns-1kaccts.journal balance  ||         2.91         3.04 ||        120.17M      130.21M |
| -f examples/50ktxns-1kaccts.journal balance  ||         3.60         3.62 ||        129.01M      140.33M |
| -f examples/60ktxns-1kaccts.journal balance  ||         4.28         4.44 ||        162.47M      175.05M |
| -f examples/70ktxns-1kaccts.journal balance  ||         4.99         5.07 ||        195.87M      207.53M |
| -f examples/80ktxns-1kaccts.journal balance  ||         5.60         5.76 ||        211.37M      219.93M |
| -f examples/90ktxns-1kaccts.journal balance  ||         6.34         6.49 ||        241.11M      250.92M |
| -f examples/100ktxns-1kaccts.journal balance ||         6.90         7.01 ||        255.24M      264.18M |
+----------------------------------------------++---------------------------++-----------------------------+

Mar 03 '25 07:03 thomie

Hello @simonmichael,

In #6 , I just added --max-bytes-used.

It would be quite easy to support any or all of the following ghc runtime stats:

 [("bytes allocated", "25546720")
 ,("num_GCs", "9")
 ,("average_bytes_used", "280828")
 ,("max_bytes_used", "836352")
 ,("num_byte_usage_samples", "4")
 ,("peak_megabytes_allocated", "8")
 ,("init_cpu_seconds", "0.000944")
 ,("init_wall_seconds", "0.000727")
 ,("mut_cpu_seconds", "0.019297")
 ,("mut_wall_seconds", "0.762342")
 ,("GC_cpu_seconds", "0.005914")
 ,("GC_wall_seconds", "0.006002")
 ,("exit_cpu_seconds", "0.000326")
 ,("exit_wall_seconds", "0.001238")
 ,("total_cpu_seconds", "0.026521")
 ,("total_wall_seconds", "0.770320")
 ,("major_gcs", "4")
 ,("allocated_bytes", "25546720")
 ,("max_live_bytes", "836352")
 ,("max_large_objects_bytes", "102448")
 ,("max_compact_bytes", "0")
 ,("max_slop_bytes", "45504")
 ,("max_mem_in_use_bytes", "8388608")
 ,("cumulative_live_bytes", "1123312")
 ,("copied_bytes", "4343224")
 ,("par_copied_bytes", "0")
 ,("cumulative_par_max_copied_bytes", "0")
 ,("cumulative_par_balanced_copied_bytes", "0")
 ,("fragmentation_bytes", "0")
 ,("alloc_rate", "1323842590")
 ,("productivity_cpu_percent", "0.729142")
 ,("productivity_wall_percent", "0.989658")
 ,("bound_task_count", "1")
 ,("sparks_count", "0")
 ,("sparks_converted", "0")
 ,("sparks_overflowed", "0")
 ,("sparks_dud ", "0")
 ,("sparks_gcd", "0")
 ,("sparks_fizzled", "0")
 ,("work_balance", "0.000000")
 ,("n_capabilities", "1")
 ,("task_count", "4")
 ,("peak_worker_count", "3")
 ,("worker_count", "3")
 ,("gen_0_collections", "5")
 ,("gen_0_par_collections", "0")
 ,("gen_0_cpu_seconds", "0.002513")
 ,("gen_0_wall_seconds", "0.002565")
 ,("gen_0_max_pause_seconds", "0.000854")
 ,("gen_0_avg_pause_seconds", "0.000513")
 ,("gen_1_collections", "4")
 ,("gen_1_par_collections", "0")
 ,("gen_1_cpu_seconds", "0.003401")
 ,("gen_1_wall_seconds", "0.003437")
 ,("gen_1_max_pause_seconds", "0.002737")
 ,("gen_1_avg_pause_seconds", "0.000859")
 ]

Rather than a separate flag for each, something like --rts-stats=max_bytes_used,peak_megabytes_allocated perhaps?

What do you think?

Mar 03 '25 07:03 thomie

Woo that's a lot! Let's keep it simple, max_byte_used looks the most useful.

This is quite a specific feature, but useful to haskell devs like us. How about calling it -m/--mem (or -m/--mem-ghc if we want to be more accurate).

Apr 05 '25 10:04 simonmichael

If used with a non-haskell program, or a haskell program not compiled the right way, what happens ?

Is it worth using GHC's RTS stats, or could we get the same info by a more general technique, not depending on GHC ?

Apr 05 '25 10:04 simonmichael

I think "bytes allocated" could be quite useful as well, as a proxy for runtime.

For example from https://hasura.io/blog/hasura-and-well-typed-collaborate-on-haskell-tooling#fn1:

The number of bytes allocated acts as a proxy for the amount of computation performed, since Haskell programs tend to allocate frequently, and allocations are more consistent than CPU or wall clock time. ↩︎

I can't find a better reference now, but it's what the ghc project itself used to do / does. It would be a partial answer to your question in https://github.com/simonmichael/hledger/issues/2122#issuecomment-1973628441:

PS any ideas for simple robust automated performance testing

Apr 06 '25 14:04 thomie

If used with a non-haskell program, what happens ?

You get some inexplicable error message:

hGetLine: end of file

or a haskell program not compiled the right way,

error: hledger-1.42: Most RTS options are disabled. Link with -rtsopts to enable them.

I guess it would be better to first check if the program is a haskell program and whether it's linked with -rtsopts (by checking +RTS --info), and show a warning and ignore the -m flag if it isn't. TODO.

--mem-ghc

There are multiple different memory measurements collected by the ghc runtime system, so just "--mem-ghc" is perhaps not accurate enough (is it max or average or total allocation?).

A --rts-stats option would be the most flexible.

Apr 06 '25 15:04 thomie

Currently, quickbench is positioned as a really quick and easy (quick and dirty, some might say) reporting tool - "a better time". I think adding a lot of specialist measurements doesn't fit with this. Do we think it's worthwhile/affordable to expand the scope ? Our competition would be tools like bench (haskell) and hyperfine (rust).

Apr 09 '25 19:04 simonmichael

I'm not against adding a simple "memory" measurement for haskell programs, as I'd personally find that very handy, but I wonder how far to go.

Apr 09 '25 20:04 simonmichael

I'd also love to show "transactions per second" in these reports when benchmarking hledger or other PTA apps. Supporting such custom metrics, somehow, would be another nice feature (and scope expansion).

Apr 09 '25 20:04 simonmichael

We could add a --custom-metrics-parser option to quickbench.

Its argument should be the path to a program whose task it is to parse the output of the program under test and output custom metrics in some standardized format (json/csv). quickbench would then take those metrics and report them alongside the time measurements.

In hledger, there would be a hledger-metrics.sh script which outputs for a single run for example:

{"memory": 123000000, "transactions per second": 20000}

Then quickbench --custom-metrics-parser=hledger-metrics.sh bench.sh reports:

	time	memory	transactions per second
-f examples/1ktxns-1kaccts.journal balance	0.21	123000000	20000
...	...	...	...

Apr 10 '25 07:04 thomie

Increasingly off topic, but while we're brainstorming: I wished for an easy display of changes (perhaps only between two executables). Eg to summarise results for two hledger versions I made this one by hand:

command	1k txns	10k txns	100k txns
time:
all commands	=	=	=
memory:
print	+slightly	-10%	-1%
register	-10%	-18%	-5%
balance	+slightly	-13%	-5%

May 17 '25 20:05 simonmichael