tskit
tskit copied to clipboard
Memory profiling
We added time benchmarks in #2454, it would also be useful to record the peak memory usage.
This page lists a few ways to do this which look interesting:
grep ^VmPeak /proc/113/status/usr/bin/time -v {cmd} | grep "Maximum resident set size"valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out {CMD}; grep mem_heap_B massif.out | sed -e 's/mem_heap_B=\(.*\)/\1/' | sort -g | tail -n 1
Figure out which of these is best to use and incorporate into the python/benchmark/run.py script.
Decided to test these while I had my head in it:
grep ^VmPeak /proc/113/status- Not ideal as need to run this when the process is going
/usr/bin/time -v {cmd} | grep "Maximum resident set size"- Process doesn't take any longer than normal
- Results seem roughly consistent but have variability:
pass20 runs: min: 9,520k, max: 9,704kimport numpy;numpy.arange(20_000_000, dtype=numpy.int8)20 runs: min: 48,380k, max 48,832kimport numpy;numpy.arange(2_000_000, dtype=numpy.int8)20 runs: min: 30,900k, max 31,072kimport numpy;numpy.arange(1, dtype=numpy.int8)20 runs: min: 29,028k, max 29,512k
valgrind --tool=massif- Process takes longer (2s vs 0.14 for this simple example)
- Result is much more consistent only 1 run of 20 gave a different answer which only differed by 4k bytes. Reading the docs this is because although it is snapshot based, snapshots are triggered by deallocations.
pass: 23,658,496import numpy;numpy.arange(20_000_000, dtype=numpy.int8): 439,316,480import numpy;numpy.arange(2_000_000, dtype=numpy.int8): 421,314,560import numpy;numpy.arange(1, dtype=numpy.int8): 419,311,616
The valgrind numbers are so clean, I think it is worth seeing if the benchmark suite will run in a sensible amount of time using that. The nice thing about memory benchmarks is the we can parallelise though if needed.
FWIW, I've found the /usr/bin/time approach to be robust and reliable. The only slightly annoying thing is that you need GNU time, which mac users have to jump through some hoops for. You could probably use psutil or memory-profiler if you wanted to do this in Python.
We'd be rounding these numbers to (at least) the megabyte so a bit of variability is fine.