SemiBin icon indicating copy to clipboard operation
SemiBin copied to clipboard

Extract then merge

Open alienzj opened this issue 9 months ago • 9 comments

When processing large-scale samples using SemiBin multi binning mode, data_cov.csv and data_split_cov.csv may require 1TB+ memory. This PR is dedicated to extracting sample-wise contigs coverage first and then merging, which can significantly reduce memory usage.

And after testing, I found it was still very slow when processing many (1K+) CSV files. So I updated the code to use polars to parse CSV file.

alienzj avatar Apr 25 '24 05:04 alienzj