tskit
tskit copied to clipboard
Memory profiling
We added time benchmarks in #2454, it would also be useful to record the peak memory usage.
This page lists a few ways to do this which look interesting:
-
grep ^VmPeak /proc/113/status
-
/usr/bin/time -v {cmd} | grep "Maximum resident set size"
-
valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out {CMD}; grep mem_heap_B massif.out | sed -e 's/mem_heap_B=\(.*\)/\1/' | sort -g | tail -n 1
Figure out which of these is best to use and incorporate into the python/benchmark/run.py
script.
Decided to test these while I had my head in it:
-
grep ^VmPeak /proc/113/status
- Not ideal as need to run this when the process is going
-
/usr/bin/time -v {cmd} | grep "Maximum resident set size"
- Process doesn't take any longer than normal
- Results seem roughly consistent but have variability:
-
pass
20 runs: min: 9,520k, max: 9,704k -
import numpy;numpy.arange(20_000_000, dtype=numpy.int8)
20 runs: min: 48,380k, max 48,832k -
import numpy;numpy.arange(2_000_000, dtype=numpy.int8)
20 runs: min: 30,900k, max 31,072k -
import numpy;numpy.arange(1, dtype=numpy.int8)
20 runs: min: 29,028k, max 29,512k
-
-
valgrind --tool=massif
- Process takes longer (2s vs 0.14 for this simple example)
- Result is much more consistent only 1 run of 20 gave a different answer which only differed by 4k bytes. Reading the docs this is because although it is snapshot based, snapshots are triggered by deallocations.
-
pass
: 23,658,496 -
import numpy;numpy.arange(20_000_000, dtype=numpy.int8)
: 439,316,480 -
import numpy;numpy.arange(2_000_000, dtype=numpy.int8)
: 421,314,560 -
import numpy;numpy.arange(1, dtype=numpy.int8)
: 419,311,616
The valgrind numbers are so clean, I think it is worth seeing if the benchmark suite will run in a sensible amount of time using that. The nice thing about memory benchmarks is the we can parallelise though if needed.
FWIW, I've found the /usr/bin/time
approach to be robust and reliable. The only slightly annoying thing is that you need GNU time, which mac users have to jump through some hoops for. You could probably use psutil
or memory-profiler
if you wanted to do this in Python.
We'd be rounding these numbers to (at least) the megabyte so a bit of variability is fine.