nextclade
nextclade copied to clipboard
PERF: Decompression using system zstd is quite a lot more efficient than using packaged zstd
Maybe not surprsingly, using standalone zstd for decompression and piping into nextclade is quite a bit faster than using the packaged zstd.
I guess we could add some notes somewhere warning of this - basically showing that the compression support should only really be used for interactive CLI convenience rather than for scripting/production.
Here's timing:
❯ time zstd -dc 10k.fasta.zst | nextclade run -D data --output-fasta seq.fasta.zst
________________________________________________________
Executed in 40.33 secs fish external
usr time 436.83 secs 16.42 millis 436.81 secs
sys time 20.30 secs 2.57 millis 20.29 secs
❯ time nextclade run -D data --output-fasta seq.fasta.zst 10k.fasta.zst
______________________________________
Executed in 66.30 secs fish external
usr time 696.01 secs 48.32 millis 695.97 secs
sys time 32.27 secs 7.77 millis 32.26 secs
Using inbuilt adds more than 50% to runtime. Weird that so much CPU is used by packaged zstd. When piping it's only at 1% CPU usage. The timing above suggests 250% when using packaged.
Note: running this on Intel MPB with 6 cores = 12 threads.
Can't reproduce this on Linux (GNU flavor). In my measurements, Nextclade decompression is always marginally faster than the external one.
Can you try on a Linux machine? What parameters do you use for the input file compression?
Nextclade might be less efficient on macOS. For example, we use default memory allocator there, while using more efficient mimalloc on Linux. There might be other differences.
I expect that most production workflows are run on Linux.