SemiBin
SemiBin copied to clipboard
Extract then merge
When processing large-scale samples using SemiBin multi
binning mode, data_cov.csv
and data_split_cov.csv
may require 1TB+ memory. This PR is dedicated to extracting sample-wise contigs coverage first and then merging, which can significantly reduce memory usage.
And after testing, I found it was still very slow when processing many (1K+) CSV files. So I updated the code to use polars
to parse CSV file.