luceneutil
luceneutil copied to clipboard
Use Rob's cool vmstat/gnuplot tooling to plot hardware utilization
@rmuir has some cool tooling that measures how effectively processes are using CPU/RAM/IO. Let's turn this on at least during indexing phases of nightly benchy, when we are trying to fully saturate the available hardware resources...
@mikemccand to start with "text mode", just do this:
start of benchy:
$ nohup vmstat --active --wide --timestamp --unit M 1 > vmstat.log 2>/dev/null &
end of benchy:
$ pkill vmstat
it should produce human readable log looking like this:
router:~$ vmstat --active --wide --timestamp --unit M 1
--procs-- -----------------------memory---------------------- ---swap-- -----io---- -system-- ----------cpu---------- -----timestamp-----
r b swpd free inact active si so bi bo in cs us sy id wa st gu EDT
1 0 0 3177 89 140 0 0 2 6 277 0 0 0 100 0 0 0 2024-10-02 08:12:00
0 0 0 3177 89 140 0 0 0 0 323 161 0 0 100 0 0 0 2024-10-02 08:12:01
1 0 0 3177 89 140 0 0 0 956 373 222 0 0 100 0 0 0 2024-10-02 08:12:02
0 0 0 3177 89 140 0 0 0 0 93 84 0 0 100 0 0 0 2024-10-02 08:12:03
0 0 0 3177 89 140 0 0 0 8 170 134 0 0 100 0 0 0 2024-10-02 08:12:04
We can then add fancy graph plot to "end of benchy", but the text mode itself is a good way to start?
also if you don't want output in your local time, set TZ variable:
$ TZ=UTC vmstat ...
if you don't want inactive/active remove the --active flag. I prefer this way of breaking down the memory, but maybe for lucene, you'd prefer to just monitor OS cache instead:
$ TZ=UTC vmstat --wide --timestamp --unit M 1
--procs-- -----------------------memory---------------------- ---swap-- -----io---- -system-- ----------cpu---------- -----timestamp-----
r b swpd free buff cache si so bi bo in cs us sy id wa st gu UTC
1 0 0 14480 178 1890 0 0 17 84 628 3 3 1 95 0 0 0 2024-10-02 12:49:15
1 0 0 14485 178 1880 0 0 0 0 391 733 0 2 98 0 0 0 2024-10-02 12:49:16
0 0 0 14494 178 1875 0 0 0 0 433 784 1 1 98 0 0 0 2024-10-02 12:49:17
0 0 0 14506 178 1873 0 0 0 0 374 1083 3 1 96 0 0 0 2024-10-02 12:49:18
OK! I did the first step, which is to run vmstat during each indexing run and save away the resulting log. Yesterday's run produced these exciting vmstat logs:
beast3:tmp$ wc -l *vmstat*
120 fastIndexBigDocs.vmstat.log
148 fastIndexMediumDocs.vmstat.log
3903 fastIndexMediumDocsWithVectorsQuantized.vmstat.log
6416 fastIndexMediumDocsWithVectors.vmstat.log
20995 fixedIndex.vmstat.log
240 nrtIndexMediumDocs.vmstat.log
31822 total
Only 120 lines (120 seconds) for fastIndexBigDocs.vmsat.log!
Next we need to somehow gnuplot these buggers?
thanks @mikemccand! can you hook me up with these sample data files? it will be faster to make it look decent.
Does this work? vmstat-logs.zip
This is all the vmstat logs, each from building the different N indices that nightly benchy builds ...
thank you @mikemccand, that works. crazy how idle your CPU is while doing this.
@mikemccand see https://rmuir.org/vmstat-report/
You can grab the 3 files there to reproduce it:
vmstat.gpi: gnuplot file to make several .svg filesindex.html.template: simple html file to lay them outgenerate.sh: script that makes directory with these files for each log file
Seems the hardware is not fully utilized. There is a direct correlation to the CPU usage with the number of process (=threads here). I guess you have some insane number of cores?
Whoa, thanks @rmuir! Nice charts! Yeah, this is beast3 with 128 cores, and one of those indexing runs (the fixedIndex one) uses only one core (mostly) so that it generates a precise deterministic index each night so the search results are "comparable". But for the other runs, I think I expect them to be saturating CPU (ish?) ... maybe I'm not using enough concurrency for them!
This makes sense. These are just a start, we can tweak/beautify later once things are working.
I struggled a bit because SVG files were enormous out of box: the program is written by scientists, so SVG out of box were totally lossless and accurately representing all 20,000+ datapoints in your fixedIndex, which made for huge sizes.
I had not done interpolation before so had to figure it out, I tried to add comments so you can tweak any of this if you want.
Last night's benchy made the nice gnuplot vmstat charts!
It's linked off the nightly details page, to here. Each sub-directory has the charts for building that one index, e.g. fastIndexMediumDocs. It is quite depressing how idle the CPUs actually are, especially at the end of the run. We are not waiting for merges, so I suspect this is just Lucene moving the (big) in-memory segments to disk ... I'll see if I can better tune these runs to saturate the hardware.
We are not waiting for merges, so I suspect this is just Lucene moving the (big) in-memory segments to disk ... I'll see if I can better tune these runs to saturate the hardware.
@mikemccand This could be whats happening. Normally, I'd expect to see some iowait here for something like that (maybe, iowait is tricky like that).
But there could be a challenge: you have 128 cpus, so if e.g. we have single cpu in iowait, we may not see it represented in the vmstat output, because it is not even 1%?