dolphin
dolphin copied to clipboard
Benchmarking runtimes
Top level issue tracking the improvements/additions needed to track a full-frame workflow runtime.
Current knobs to test are
- Using CPU vs GPU
- SLC input stack size
- Output posting (or equivalently
strides) - Size of blocks loaded at one time from the input stack
- For CPU, number of CPUs/number of threads per CPU
- Algorithm for phase linking (MLE vs EVD)
I've put these in the order of my guess for which will have the biggest effect, but we clearly need to do the tests to see.
Things we need for good testing
- the single-update workflow script (#11)
- recording the threads (#28)
- recording the block size/fixing the
max_ram_gboption (#32) - adding the ability to use EVD instead of MLE (#138 )
- using
vmtouch -eon the SLC stack before starting the workflow: https://github.com/hoytech/vmtouch . This will make sure we don't get very fast runs just because the SLC data as cached, as we can't count on that happening for the production runs.