Draft: feat: --max-bytes-used (#2)
When supplying the flag --max-bytes-used, quickbench now also reports the max residency of the program under test. Works for haskell programs only, and they need to be compiled with -rtsopts.
Example output:
+----------------------------------------------++---------------------------++-----------------------------+
| || Time (s) || Max bytes used |
+==============================================++===========================++=============================+
| || hledger-1.40 hledger-1.41 || hledger-1.40 hledger-1.41 |
+==============================================++===========================++=============================+
| -f examples/1ktxns-1kaccts.journal balance || 0.21 0.21 || 3.84M 4.28M |
| -f examples/2ktxns-1kaccts.journal balance || 0.35 0.33 || 7.43M 6.39M |
| -f examples/3ktxns-1kaccts.journal balance || 0.49 0.49 || 10.48M 11.59M |
| -f examples/4ktxns-1kaccts.journal balance || 0.53 0.36 || 14.61M 11.49M |
| -f examples/5ktxns-1kaccts.journal balance || 0.47 0.41 || 18.32M 15.52M |
| -f examples/6ktxns-1kaccts.journal balance || 0.47 0.49 || 21.72M 21.60M |
| -f examples/7ktxns-1kaccts.journal balance || 0.55 0.57 || 22.35M 25.17M |
| -f examples/8ktxns-1kaccts.journal balance || 0.61 0.64 || 22.28M 24.02M |
| -f examples/9ktxns-1kaccts.journal balance || 0.70 0.67 || 31.41M 24.03M |
| -f examples/10ktxns-1kaccts.journal balance || 0.77 0.78 || 36.00M 35.56M |
| -f examples/20ktxns-1kaccts.journal balance || 1.52 1.55 || 72.62M 72.62M |
| -f examples/30ktxns-1kaccts.journal balance || 2.19 2.29 || 85.87M 96.99M |
| -f examples/40ktxns-1kaccts.journal balance || 2.91 3.04 || 120.17M 130.21M |
| -f examples/50ktxns-1kaccts.journal balance || 3.60 3.62 || 129.01M 140.33M |
| -f examples/60ktxns-1kaccts.journal balance || 4.28 4.44 || 162.47M 175.05M |
| -f examples/70ktxns-1kaccts.journal balance || 4.99 5.07 || 195.87M 207.53M |
| -f examples/80ktxns-1kaccts.journal balance || 5.60 5.76 || 211.37M 219.93M |
| -f examples/90ktxns-1kaccts.journal balance || 6.34 6.49 || 241.11M 250.92M |
| -f examples/100ktxns-1kaccts.journal balance || 6.90 7.01 || 255.24M 264.18M |
+----------------------------------------------++---------------------------++-----------------------------+
Hello @simonmichael,
In #6 , I just added --max-bytes-used.
It would be quite easy to support any or all of the following ghc runtime stats:
[("bytes allocated", "25546720")
,("num_GCs", "9")
,("average_bytes_used", "280828")
,("max_bytes_used", "836352")
,("num_byte_usage_samples", "4")
,("peak_megabytes_allocated", "8")
,("init_cpu_seconds", "0.000944")
,("init_wall_seconds", "0.000727")
,("mut_cpu_seconds", "0.019297")
,("mut_wall_seconds", "0.762342")
,("GC_cpu_seconds", "0.005914")
,("GC_wall_seconds", "0.006002")
,("exit_cpu_seconds", "0.000326")
,("exit_wall_seconds", "0.001238")
,("total_cpu_seconds", "0.026521")
,("total_wall_seconds", "0.770320")
,("major_gcs", "4")
,("allocated_bytes", "25546720")
,("max_live_bytes", "836352")
,("max_large_objects_bytes", "102448")
,("max_compact_bytes", "0")
,("max_slop_bytes", "45504")
,("max_mem_in_use_bytes", "8388608")
,("cumulative_live_bytes", "1123312")
,("copied_bytes", "4343224")
,("par_copied_bytes", "0")
,("cumulative_par_max_copied_bytes", "0")
,("cumulative_par_balanced_copied_bytes", "0")
,("fragmentation_bytes", "0")
,("alloc_rate", "1323842590")
,("productivity_cpu_percent", "0.729142")
,("productivity_wall_percent", "0.989658")
,("bound_task_count", "1")
,("sparks_count", "0")
,("sparks_converted", "0")
,("sparks_overflowed", "0")
,("sparks_dud ", "0")
,("sparks_gcd", "0")
,("sparks_fizzled", "0")
,("work_balance", "0.000000")
,("n_capabilities", "1")
,("task_count", "4")
,("peak_worker_count", "3")
,("worker_count", "3")
,("gen_0_collections", "5")
,("gen_0_par_collections", "0")
,("gen_0_cpu_seconds", "0.002513")
,("gen_0_wall_seconds", "0.002565")
,("gen_0_max_pause_seconds", "0.000854")
,("gen_0_avg_pause_seconds", "0.000513")
,("gen_1_collections", "4")
,("gen_1_par_collections", "0")
,("gen_1_cpu_seconds", "0.003401")
,("gen_1_wall_seconds", "0.003437")
,("gen_1_max_pause_seconds", "0.002737")
,("gen_1_avg_pause_seconds", "0.000859")
]
Rather than a separate flag for each, something like --rts-stats=max_bytes_used,peak_megabytes_allocated perhaps?
What do you think?
Woo that's a lot! Let's keep it simple, max_byte_used looks the most useful.
This is quite a specific feature, but useful to haskell devs like us. How about calling it -m/--mem (or -m/--mem-ghc if we want to be more accurate).
If used with a non-haskell program, or a haskell program not compiled the right way, what happens ?
Is it worth using GHC's RTS stats, or could we get the same info by a more general technique, not depending on GHC ?
I think "bytes allocated" could be quite useful as well, as a proxy for runtime.
For example from https://hasura.io/blog/hasura-and-well-typed-collaborate-on-haskell-tooling#fn1:
The number of bytes allocated acts as a proxy for the amount of computation performed, since Haskell programs tend to allocate frequently, and allocations are more consistent than CPU or wall clock time. ↩︎
I can't find a better reference now, but it's what the ghc project itself used to do / does. It would be a partial answer to your question in https://github.com/simonmichael/hledger/issues/2122#issuecomment-1973628441:
PS any ideas for simple robust automated performance testing
If used with a non-haskell program, what happens ?
You get some inexplicable error message:
hGetLine: end of file
or a haskell program not compiled the right way,
error: hledger-1.42: Most RTS options are disabled. Link with -rtsopts to enable them.
I guess it would be better to first check if the program is a haskell program and whether it's linked with -rtsopts (by checking +RTS --info), and show a warning and ignore the -m flag if it isn't. TODO.
--mem-ghc
There are multiple different memory measurements collected by the ghc runtime system, so just "--mem-ghc" is perhaps not accurate enough (is it max or average or total allocation?).
A --rts-stats option would be the most flexible.
Currently, quickbench is positioned as a really quick and easy (quick and dirty, some might say) reporting tool - "a better time". I think adding a lot of specialist measurements doesn't fit with this. Do we think it's worthwhile/affordable to expand the scope ? Our competition would be tools like bench (haskell) and hyperfine (rust).
I'm not against adding a simple "memory" measurement for haskell programs, as I'd personally find that very handy, but I wonder how far to go.
I'd also love to show "transactions per second" in these reports when benchmarking hledger or other PTA apps. Supporting such custom metrics, somehow, would be another nice feature (and scope expansion).
We could add a --custom-metrics-parser option to quickbench.
Its argument should be the path to a program whose task it is to parse the output of the program under test and output custom metrics in some standardized format (json/csv). quickbench would then take those metrics and report them alongside the time measurements.
In hledger, there would be a hledger-metrics.sh script which outputs for a single run for example:
{"memory": 123000000, "transactions per second": 20000}
Then quickbench --custom-metrics-parser=hledger-metrics.sh bench.sh reports:
| time | memory | transactions per second | |
|---|---|---|---|
| -f examples/1ktxns-1kaccts.journal balance | 0.21 | 123000000 | 20000 |
| ... | ... | ... | ... |
Increasingly off topic, but while we're brainstorming: I wished for an easy display of changes (perhaps only between two executables). Eg to summarise results for two hledger versions I made this one by hand:
| command | 1k txns | 10k txns | 100k txns |
|---|---|---|---|
| time: | |||
| all commands | = | = | = |
| memory: | |||
| +slightly | -10% | -1% | |
| register | -10% | -18% | -5% |
| balance | +slightly | -13% | -5% |