seastar
seastar copied to clipboard
Reset _total_stats before each run
_total_stats were not being reset before each run, meaning that values were inflated by a factor of R for a benchmark with R runs (e.g., by 5x for the default of 5 runs).
This affected only non-time metrics tracked in perf_stats: allocs, tasks and instruction count.
Hunting down some nonsensical instruction counts in a perf test and this turned out to be the cause.
Note that this only affects tests that use perf_tests::start_measuring_time() as tests which instead just implicitly measure the entire time don't use _total_stats.
@tgrabiec please review, it's your code IIRC
I think @bhalevy has also been in this area recently and may have added the perf_stats functionality (very nice, BTW).
I think @bhalevy has also been in this area recently and may have added the
perf_statsfunctionality (very nice, BTW).
I just copied it from scylla :)
Looks good to me. Can you please run the existing tests and quote before and after numbers to if and what changed, to make sure they still make sense.
(please remember to set your host's cpufreq governor to user at the lowest frequency to eliminate cpu core temperature effects.)
Can you please run the existing tests and quote before and after numbers to if and what changed, to make sure they still make sense.
You mean in the seastar repo, right? I can do that if you'd like, but none of the existing tests use start_measuring_time so I don't expect any difference since _total_stats is not used unless that method is used.
@avikivity It's not mine.
Can you please run the existing tests and quote before and after numbers to if and what changed, to make sure they still make sense.
You mean in the seastar repo, right?
yes
I can do that if you'd like, but none of the existing tests use
start_measuring_timeso I don't expect any difference since_total_statsis not used unless that method is used.
understood, but let's verify that empirically to make sure we haven't missed anything.
I had trouble running fair_queue_perf, as described in https://github.com/scylladb/seastar/issues/1175, fix is in the associated PR.
@bhalevy
I have now run all the existing _perf tests before and after this change (both runs included the cherry-picked fix for #1175), like so:
for t in $PTESTS; do echo "running $t" && tests/perf/$t --smp=2 || { echo "FAILED"; break; }; done | tee -a before.txt
(the after run went to after.txt)
There was some timing variation, as to be expected, though generally modest. Timing does not come from _total_stat, so I think we can neglect this.
I then focused on the last three columns, tasks, allocs which come from _total_stat:
diff -s <(cat before.txt | tr -s ' ' | cut -d' ' -f7-9) <(cat after.txt | tr -s ' ' | cut -d' ' -f7-9)
Result (recall the three columns are
allocs tasks inst):
39c39
< 3.500 3.000 993.5
---
> 3.500 3.000 993.4
47,48c47,48
< 3.000 4.000 1128.6
< 2.000 2.500 716.9
---
> 3.000 4.000 1128.5
> 2.000 2.500 716.8
61c61
< 146.000 0.000 2847533.4
---
> 146.000 0.000 2847241.6
65,66c65,66
< 145.000 0.000 1983031.7
< 136.000 0.000 13100595.3
---
> 145.000 0.000 1983031.6
> 136.000 0.000 13100594.6
69,70c69,70
< 188.000 0.000 26000196.5
< 10.000 0.000 17020168.3
---
> 188.000 0.000 26001157.3
> 10.000 0.000 17020168.2
77,78c77,78
< max 665.5 op/s
< max 45619736.3 op/s
---
> max 666.1 op/s
> max 46096694.2 op/s
There are some differences, mostly attributable to small instruction count differences, which is to be expected as this is not completely deterministic. The larger differences appear to occur when more/less tasks were executed, i.e., a difference in suspension point timing, so I think we can also ignore these.
Is it sufficient?
Thanks Benny for the review, it's appreciated.