tskit icon indicating copy to clipboard operation
tskit copied to clipboard

Memory profiling

Open benjeffery opened this issue 1 year ago • 2 comments

We added time benchmarks in #2454, it would also be useful to record the peak memory usage.

This page lists a few ways to do this which look interesting:

  • grep ^VmPeak /proc/113/status
  • /usr/bin/time -v {cmd} | grep "Maximum resident set size"
  • valgrind --tool=massif --pages-as-heap=yes --massif-out-file=massif.out {CMD}; grep mem_heap_B massif.out | sed -e 's/mem_heap_B=\(.*\)/\1/' | sort -g | tail -n 1

Figure out which of these is best to use and incorporate into the python/benchmark/run.py script.

benjeffery avatar Aug 04 '22 11:08 benjeffery

Decided to test these while I had my head in it:

  • grep ^VmPeak /proc/113/status
    • Not ideal as need to run this when the process is going
  • /usr/bin/time -v {cmd} | grep "Maximum resident set size"
    • Process doesn't take any longer than normal
    • Results seem roughly consistent but have variability:
      • pass 20 runs: min: 9,520k, max: 9,704k
      • import numpy;numpy.arange(20_000_000, dtype=numpy.int8) 20 runs: min: 48,380k, max 48,832k
      • import numpy;numpy.arange(2_000_000, dtype=numpy.int8) 20 runs: min: 30,900k, max 31,072k
      • import numpy;numpy.arange(1, dtype=numpy.int8) 20 runs: min: 29,028k, max 29,512k
  • valgrind --tool=massif
    • Process takes longer (2s vs 0.14 for this simple example)
    • Result is much more consistent only 1 run of 20 gave a different answer which only differed by 4k bytes. Reading the docs this is because although it is snapshot based, snapshots are triggered by deallocations.
    • pass: 23,658,496
    • import numpy;numpy.arange(20_000_000, dtype=numpy.int8): 439,316,480
    • import numpy;numpy.arange(2_000_000, dtype=numpy.int8): 421,314,560
    • import numpy;numpy.arange(1, dtype=numpy.int8): 419,311,616

The valgrind numbers are so clean, I think it is worth seeing if the benchmark suite will run in a sensible amount of time using that. The nice thing about memory benchmarks is the we can parallelise though if needed.

benjeffery avatar Aug 04 '22 13:08 benjeffery

FWIW, I've found the /usr/bin/time approach to be robust and reliable. The only slightly annoying thing is that you need GNU time, which mac users have to jump through some hoops for. You could probably use psutil or memory-profiler if you wanted to do this in Python.

We'd be rounding these numbers to (at least) the megabyte so a bit of variability is fine.

jeromekelleher avatar Aug 08 '22 08:08 jeromekelleher