nextclade icon indicating copy to clipboard operation
nextclade copied to clipboard

PERF: Decompression using system zstd is quite a lot more efficient than using packaged zstd

Open corneliusroemer opened this issue 2 years ago • 1 comments

Maybe not surprsingly, using standalone zstd for decompression and piping into nextclade is quite a bit faster than using the packaged zstd.

I guess we could add some notes somewhere warning of this - basically showing that the compression support should only really be used for interactive CLI convenience rather than for scripting/production.

Here's timing:

❯ time zstd -dc 10k.fasta.zst | nextclade run -D data --output-fasta seq.fasta.zst
________________________________________________________
Executed in   40.33 secs    fish           external
   usr time  436.83 secs   16.42 millis  436.81 secs
   sys time   20.30 secs    2.57 millis   20.29 secs

❯ time nextclade run -D data --output-fasta seq.fasta.zst  10k.fasta.zst
______________________________________
Executed in   66.30 secs    fish           external
   usr time  696.01 secs   48.32 millis  695.97 secs
   sys time   32.27 secs    7.77 millis   32.26 secs

Using inbuilt adds more than 50% to runtime. Weird that so much CPU is used by packaged zstd. When piping it's only at 1% CPU usage. The timing above suggests 250% when using packaged.

Note: running this on Intel MPB with 6 cores = 12 threads.

corneliusroemer avatar Jun 30 '22 11:06 corneliusroemer

Can't reproduce this on Linux (GNU flavor). In my measurements, Nextclade decompression is always marginally faster than the external one.

Can you try on a Linux machine? What parameters do you use for the input file compression?

Nextclade might be less efficient on macOS. For example, we use default memory allocator there, while using more efficient mimalloc on Linux. There might be other differences.

I expect that most production workflows are run on Linux.

ivan-aksamentov avatar Jun 30 '22 13:06 ivan-aksamentov