sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

add gather implementation benchmarks to sourmash docs

Open ctb opened this issue 1 month ago • 0 comments

hackmd: link

gather benchmarking - sourmash v4.8.10 / branchwater plugin v0.9.5

Source repo: sourmash-bio/2024-benchmark-gather

sample SRR1976948

This sample contains 177 genomes.

Benchmarking results with 64 threads (note pygather uses 1).

prefix s max_rss
fastmultigather_rocksdb 102.103 515.24
fastgather 152.312 13071.1
fastmultigather 441.748 13029.6
pygather 2768.48 13755.2

Notes:

  • Memory consumption is the same for all non-rocksdb implementations.
  • fastgather is much faster than the others!

These trends held across all four samples.

rocksdb indexing of GTDB rs214

Indexing GTDB rs214 (400k sequences) took 4h 47m (17255 s) and 14 GB. The rocksdb index is 7 GB.

ctb avatar Jun 30 '24 20:06 ctb