hifiasm icon indicating copy to clipboard operation
hifiasm copied to clipboard

Feature request: Separate hifiasm into stages

Open SHuang-Broad opened this issue 5 months ago • 10 comments

Hi,

Is it possible to separate hifiasm into stages (e.g. separating the read-error correction step and the phased string graph generation step)?

The application that initially led us to ask for this functionality is when we want to have both the diploid assembly and the alternative contigs for some investigation.

Thank you! Steve

SHuang-Broad avatar Feb 05 '24 18:02 SHuang-Broad

I am also interested in this, I looked at doing it by modification of the source code and while I succeeded it was quite challenging and the solution I came up with was a little bit hacky.

vellamike avatar Feb 06 '24 09:02 vellamike

You can easily rerun with the bin file to get primary/alternative, dual assembly or trio/hic assembly if you use the same prefix

baozg avatar Feb 06 '24 15:02 baozg

Oh, that's good to know., @baozg Just to confirm, hifiasm will automatically "resume" the work, if it detects the bin files matching the provided prefix?

SHuang-Broad avatar Feb 06 '24 15:02 SHuang-Broad

Yes, hifiasm will reuse all the bin files if they exist. But be careful if it is generated by a different version of hifiasm.

baozg avatar Feb 06 '24 15:02 baozg

Awesome! I'll test run with our samples and report back.

Thank you @baozg !

SHuang-Broad avatar Feb 06 '24 15:02 SHuang-Broad

Hello @vellamike @SHuang-Broad @baozg , sorry for the late reply since I was too busy during the last few weeks. Actually the ‘--bin-only’ might work. For example, if you would like to run hifiasm (Hi-C) in one step, then the command line should as follows:

hifiasm -t48 –h1 HiC_r1.fq –h2 HiC_r2.fq HiFi.fq

With ‘--bin-only’, the whole assembly procedure could be separated into two steps:

hifiasm -t48 –h1 HiC_r1.fq –h2 HiC_r2.fq --bin-only HiFi.fq ///hifiasm will only produce bin files for error correction hifiasm -t48 –h1 HiC_r1.fq –h2 HiC_r2.fq --bin-only HiFi.fq ///hifiasm will reuse the bin files

Basically, hifiasm will directly stop if any bin files have been generated with ‘--bin-only’.

chhylp123 avatar Feb 15 '24 05:02 chhylp123

Thank you, @chhylp123 !

Following your suggestion, I ran a few experiments and it works as expected!

I've attached a few plots here demonstrating how CPU, memory and disk space is used throughout the process. Hopefully this is useful. For bin generation, I used 42 cores. For the actual assembly steps, I used 28 cores.

Btw, this --bin-only flag isn't documented anywhere but I believe it should. Here's the reason: you can see from the monitoring plots, that the bin-generation stage is the main "bottleneck". It needs the most amount of resources and lasts 16 hours. The assembly steps, not only use just a few threads most of the time (~2 hours), but don't need as much memory either. For those of us who do computations in the cloud, we can reduce costs by using non-spot VMs for the bin-generation stage, and switch over to spot VMs configured with less resources.

Again, thank you for the suggestion! AltModeUsingBinFiles.HighCoverage.monitoring.log.pdf BinGeneration.HighCoverage.monitoring.log.pdf HapModeUsingBinFiles.HighCoverage.monitoring.log.pdf

Steve

SHuang-Broad avatar Mar 24 '24 21:03 SHuang-Broad