RaGOO icon indicating copy to clipboard operation
RaGOO copied to clipboard

[morecore] insufficient memory

Open mictadlo opened this issue 4 years ago • 9 comments

Hi, The Contigs.txt and NbV1ChF.fasta are 2.8 G and 2.6G. Minimap2 seems to run out of memory on 2TB machine.

$ ragoo.py -t 8 -g 100 -s -b -gff augustus.hints_utr.gff3 Contigs.txt NbV1ChF.fasta
Mon Oct 28 01:45:52 2019 --- Running : minimap2 -k19 -w19 -t8 ../NbV1ChF.fasta ../Contigs.txt > contigs_against_ref.paf 2> contigs_against_ref.paf.log
Mon Oct 28 01:49:23 2019 --- Reading alignments
Mon Oct 28 01:52:45 2019 --- Getting gff features
Mon Oct 28 01:53:07 2019 --- Getting contigs
Mon Oct 28 01:53:25 2019 --- Finding interchromosomally chimeric contigs
Mon Oct 28 01:53:25 2019 --- Finding break points and breaking interchromosomally chimeric contigs
Mon Oct 28 01:53:45 2019 --- Running : minimap2 -k19 -w19 -t8 ../../NbV1ChF.fasta Contigs.inter.chimera.broken.fa > inter_contigs_against_ref.paf 2> inter_contigs_against_ref.paf.log
Mon Oct 28 01:57:22 2019 --- Reading interchromosomal chimera broken alignments
Mon Oct 28 02:00:58 2019 --- Finding intrachromosomally chimeric contigs
Mon Oct 28 02:01:52 2019 --- Running : minimap2 -k19 -w19 -t8 ../../NbV1ChF.fasta Contigs.intra.chimera.broken.fa > intra_contigs_against_ref.paf 2> intra_contigs_against_ref.paf.log
Mon Oct 28 02:05:25 2019 --- Reading intrachromosomal chimera broken alignments
Mon Oct 28 02:09:19 2019 --- The total number of interchromasomally chimeric contigs broken is 0
Mon Oct 28 02:09:19 2019 --- The total number of intrachromasomally chimeric contigs broken is 6
Mon Oct 28 02:09:19 2019 --- Assigning contigs
Mon Oct 28 02:09:40 2019 --- Ordering and orienting contigs
Mon Oct 28 02:11:01 2019 --- Creating pseudomolecules
Mon Oct 28 05:26:02 2019 --- Aligning pseudomolecules to reference
Mon Oct 28 05:26:02 2019 --- Running : minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
/bin/bash: line 1: 10389 Aborted                 (core dumped) minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/ragoo/bin/ragoo.py", line 4, in <module>
    __import__('pkg_resources').run_script('RaGOO==1.1', 'ragoo.py')
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1469, in run_script
    exec(script_code, namespace, namespace)
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/RaGOO-1.1-py3.7.egg/EGG-INFO/scripts/ragoo.py", line 754, in <module>
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/RaGOO-1.1-py3.7.egg/EGG-INFO/scripts/ragoo.py", line 439, in align_pms
  File "/home/ubuntu/miniconda3/envs/ragoo/lib/python3.7/site-packages/RaGOO-1.1-py3.7.egg/ragoo_utilities/utilities.py", line 25, in run
RuntimeError: Failed : minimap2 -ax asm5 --cs -t8 ../../NbV1ChF.fasta ../ragoo.fasta > pm_against_ref.sam 2> pm_contigs_against_ref.sam.log
less pm_contigs_against_ref.sam.log 
[M::mm_idx_gen::55.511*1.79] collected minimizers
[M::mm_idx_gen::60.192*2.17] sorted minimizers
[M::main::60.192*2.17] loaded/built the index for 19 target sequence(s)
[M::mm_mapopt_update::64.517*2.10] mid_occ = 342
[M::mm_idx_stat] kmer size: 19; skip: 19; is_hpc: 0; #seq: 19
[M::mm_idx_stat::69.857*2.01] distinct minimizers: 143929053 (83.21% are singletons); average occurrences: 1.928; average spacing: 9.884
[M::worker_pipeline::3024.805*3.40] mapped 5 sequences
[M::worker_pipeline::4921.171*3.81] mapped 6 sequences
[M::worker_pipeline::7761.560*3.75] mapped 4 sequences
[morecore] insufficient memory

How important is the the allignment from Aligning pseudomolecules to reference or can ragoo.fasta been used? How much more memory do you think I would need?

Thank you in advance,

Michal

mictadlo avatar Oct 28 '19 19:10 mictadlo

Hmm that is interesting. Firstly, yes you can still use ragoo.fasta. The SV calling step is independent.

I am not sure why Minimap2 is running out of memory. For debugging, you can just run the same or similar command outside of RaGOO. What are you assembling?

Thanks

malonge avatar Oct 29 '19 01:10 malonge

The reference is a plant genome and to assembled it we used for it PacBio and Hi-C data. On the other hand, the 'Contigs.txt` is a pure 50x Illumina assembly created by SparseAssembler.

Additionally, I will run minimap2 manually.

Michal

mictadlo avatar Oct 29 '19 05:10 mictadlo

At this point, if you would like to call SVs, I suggest you do your SV calling manually. You can either use minimap2/paftools or nucmer/assemblytics. If you use paftools, you might as well just write your minimap2 alignments to a PAF file rather than SAM format. I also have a wiki page about this here though you would have to generate your own alignments.

malonge avatar Oct 29 '19 14:10 malonge

Hi, I think the problem was that the assembly contains many small contigs which could not have been assigned to the reference. Additionally, I asked Ragoo to put 100 N's between the unmapped contigs which lead to a Chr0 size of 550,412,653 bp.

The solution was to use -C which Ragoo.

Michal

mictadlo avatar Nov 20 '19 02:11 mictadlo

@malonge Seen as well when trying to use RaGOO on a species with a large genome (>10Gbps). The out-of-memory problem also happens, in my experience, at the earlier stage of creating pseudomolecules.

I am having a work at some of this in my fork (https://github.com/lucventurini/RaGOO), when I will be done, may I open a pull request?

lucventurini avatar Nov 20 '19 10:11 lucventurini

@mictadlo glad to hear you have resolved it. @lucventurini absolutely thanks so much for contributing.

malonge avatar Nov 20 '19 13:11 malonge

Hi,

I'm wondering if the pull request as been done and so if it should work without problem on large genomes? Thank you and have a great day Cheers

C.

cmonat avatar Apr 20 '20 10:04 cmonat

Hi there,

I am currently working on v2, which uses pysam to dramatically reduce the memory requirements. In fact, the memory for small and large genomes should be roughly the same.

I am hoping to come out with v2 in the next month or so. I will reopen the issue so that I can send a note when the new version is ready.

Thanks

malonge avatar Apr 20 '20 19:04 malonge

Hi there,

RagTag, the successor to RaGOO, is now available here:

https://github.com/malonge/RagTag

RagTag now uses pysam to query read coverage, so the memory requirement is dramatically reduced.

Thanks, Mike

malonge avatar Jun 09 '20 20:06 malonge