3d-dna icon indicating copy to clipboard operation
3d-dna copied to clipboard

No final.hic file generated

Open ithacajing opened this issue 2 years ago • 5 comments

Hi,

Here is my command and it took almost 28 days to finish.

run-asm-pipeline.sh -m diploid -i 5000 --splitter-coarse-resolution 100000 --splitter-fine-resolution 1000 Perrie_HiC.asm.hic.p_ctg.fa merged_nodups.txt

From *.rawchrom.fasta to get FINAL.fasta, it took 26 days (super SLOW). The majority time had been used to generate alignments.txt file.

rawchrom.fasta file size is 2.1G. I am wondering if the file size is too big for the pipeline to handle.

I got *.final.asm, final.cprops, *final.assembly, *FINAL.fasta and *FINAL.assembly, however, there is no final.hic file generated.

I improved the pipeline performance by doing the following:

@debian:~/tools/improved-3d-dna$ diff run-asm-pipeline.sh ../3d-dna/run-asm-pipeline.sh 194,195c194 < #default_merger_lastz_options="--gfextend\ --gapped\ --chain=200,200" < default_merger_lastz_options="--gfextend\ --gapped\ --chain=200,200\ --allocate:traceback=1.99G"

default_merger_lastz_options="--gfextend\ --gapped\ --chain=200,200" 831,834c830 < # improve the performance < # wrapped.fasta is much faster than orig_fasta < awk -f ${pipeline}/utils/wrap-fasta-sequence.awk ${orig_fasta} > ${genomeid}.wrapped.fasta < awk -f ${pipeline}/edit/edit-fasta-according-to-new-cprops.awk ${genomeid}.rawchrom.cprops ${genomeid}.wrapped.fasta > ${genomeid}.rawchrom.fasta


awk -f ${pipeline}/edit/edit-fasta-according-to-new-cprops.awk ${genomeid}.rawchrom.cprops ${orig_fasta} > ${genomeid}.rawchrom.fasta

Any suggestions why the final.hic file was missing, and why the alignment step was SO slow, how can we speed it up?

Thanks, jing

ithacajing avatar Jun 22 '22 13:06 ithacajing

Hello Jin,

Yes, your action to address the speed issue is correct regarding wrapping the fasta. This is discussed on the forum (aidenlab.org/forum.html http://aidenlab.org/forum.html). I believe also this is done automatically in the latest 3d-dna release (associated with Hoencamp et al., 2021 aka phasing branch https://github.com/aidenlab/3d-dna/releases/tag/201008).

We unfortunately do not support the diploid mode any longer, but thank you for your comments regarding traceback. I think it can probably be helpful if the contig N50 is very large. There is no final.hic with the diploid mode as liftover of read alignment positions is not a trivial task after alt haplotype merge. This is expected behavior. To build the final.hic you will actually have to rerun Juicer and visualize the contact map as your new draft (see p.5 of the Genome Assembly Cookbook dnazoo.org/methods http://dnazoo.org/methods).

Best, Olga

On Jun 22, 2022, at 8:22 AM, ithacajing @.***> wrote:

Hi,

Here is my command and it took almost 28 days to finish.

run-asm-pipeline.sh -m diploid -i 5000 --splitter-coarse-resolution 100000 --splitter-fine-resolution 1000 Perrie_HiC.asm.hic.p_ctg.fa merged_nodups.txt

From *.rawchrom.fasta to get FINAL.fasta, it took 26 days (super SLOW). The majority time had been used to generate alignments.txt file.

rawchrom.fasta file size is 2.1G. I am wondering if the file size is too big for the pipeline to handle.

I got *.final.asm, final.cprops, *final.assembly, *FINAL.fasta and *FINAL.assembly, however, there is no final.hic file generated.

I improved the pipeline performance by doing the following:

@Debian https://github.com/Debian:~/tools/improved-3d-dna$ diff run-asm-pipeline.sh ../3d-dna/run-asm-pipeline.sh 194,195c194 < #default_merger_lastz_options="--gfextend\ --gapped\ --chain=200,200" < default_merger_lastz_options="--gfextend\ --gapped\ --chain=200,200\ --allocate:traceback=1.99G"

default_merger_lastz_options="--gfextend\ --gapped\ --chain=200,200" 831,834c830 < # improve the performance < # wrapped.fasta is much faster than orig_fasta < awk -f ${pipeline}/utils/wrap-fasta-sequence.awk ${orig_fasta} > ${genomeid}.wrapped.fasta < awk -f ${pipeline}/edit/edit-fasta-according-to-new-cprops.awk ${genomeid}.rawchrom.cprops ${genomeid}.wrapped.fasta > ${genomeid}.rawchrom.fasta

awk -f ${pipeline}/edit/edit-fasta-according-to-new-cprops.awk ${genomeid}.rawchrom.cprops ${orig_fasta} > ${genomeid}.rawchrom.fasta

Any suggestions why the final.hic file was missing, and why the alignment step was SO slow, how can we speed it up?

Thanks, jing

— Reply to this email directly, view it on GitHub https://github.com/aidenlab/3d-dna/issues/151, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLAMG2ODDCTRBFZBNDO6ULVQMHQZANCNFSM5ZQGU4UA. You are receiving this because you are subscribed to this thread.

dudcha avatar Jun 23 '22 20:06 dudcha

Hi Olga,

Thank you so much for your explanation and advice.

Best, Jing

ithacajing avatar Jun 28 '22 13:06 ithacajing

Hi,

I run it with haploid mode for 8 rounds of correction and didn't get the final.hic file either? Intreastingly, I soft linked the merged_nodups.txt file from the juicer output folder, and the 3d-dna somehow modified it to an empty file. Do you know what could be the reason?

best, Cui

smallfishcui avatar Aug 03 '22 08:08 smallfishcui

I am sorry, this is too vague of a description to diagnose. Please note that 8 rounds of correction are almost never necessary and you get the most bang for your buck within the first 2-3 rounds. Thanks, -Olga

On Aug 3, 2022, at 3:28 AM, smallfishcui @.***> wrote:

 Hi,

I run it with haploid mode for 8 rounds of correction and didn't get the final.hic file either? Intreastingly, I soft linked the merged_nodups.txt file from the juicer output folder, and the 3d-dna somehow modified it to an empty file. Do you know what could be the reason?

best, Cui

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

dudcha avatar Aug 03 '22 12:08 dudcha

thanks Olga. I somehow managed to finish the pipeline in another run, and the result looks quite satisfying!

best, Cui

smallfishcui avatar Aug 08 '22 08:08 smallfishcui