Minipolish icon indicating copy to clipboard operation
Minipolish copied to clipboard

Cause of read depth of 0.000x after polishing

Open desmodus1984 opened this issue 1 year ago • 6 comments

Hi,

I am trying to assemble the genome of a worm, and I was hoping that minipolish would detect the mitochondrial genome and output it in the gfa file. I have sequencing one of my worm species a lot, and I couldn't get much data, furthermore, the HPC system has a GPU card that is not compatible for dorado, so I am using dorado in CPU mode. After all the troubles, I was able to get about 3 Gb of data, but many reads with low quality, and many short, I filtered out reads with quality less than 20 and shorter than 1000 bp. My dataset is 1.3 Gb, with max read length of 113kb and average of 8.5k.

I followed the instructions: minimap2 -x ava-ont -t 12 Ju760.Jul22.fastq Ju760.Jul22.fastq | gzip -1 > Ju760.Jul22.paf.gz ``miniasm -f Ju760.Jul22.fastq ./Ju760.Jul22.paf.gz > Ju760.Jul22.gfa which worked fine,

but then, when I tried running minipolish minipolish -t 12 Ju760.Jul22.fastq Ju760.Jul22.gfa > Ju760.Jul22.polish.gfa I got a lot of empty segments removed, Racon round 2, started with this info Running Racon on round_2: reads: Ju760.Jul22.fastq (156,105 reads) input: /tmp/tmpbvbg0wzi/round_2.fasta (0 bp) alignments: /tmp/tmpbvbg0wzi/round_2.paf (0 alignments) output: /tmp/tmpbvbg0wzi/round_2_polished.fasta (0 bp)

And the output was: Aligning reads: reads: Ju760.Jul22.fastq (156,105 reads) contigs: /tmp/tmpbvbg0wzi/depths.fasta (0 bp) alignments: /tmp/tmpbvbg0wzi/depths.paf (0 alignments) mean depth: 0.000x

Could you explain how could I get a mean depth of 0.000x, I would expect to have at least 2x, from my data. Is there any parameter I can change during the process to get some gfa, at least the mitochondrial genome?

For instance, I mapped the reads to the c elegans mitogenome, and I got 277 sequences, making about 1.3 MB. Should I be able to recover the mitochondrial genome in the gfa?

Thanks;

Juan Pablo

desmodus1984 avatar Jul 22 '24 19:07 desmodus1984

Hi Juan,

Something has clearly gone wrong in during assembly or polishing: the input file to Racon seems to have a total size of 0 bp. I'm not sure where the issue is, but an assembly with a mean depth of 2x is bound to have some problems!

If all you're interested in is the mitochondrial genome, I would suggest trying to assemble just the 277 reads you identified as mitochondrial. This should be very fast and you'll know that it worked well if you get a circular contig of about the right size.

Ryan

rrwick avatar Jul 25 '24 04:07 rrwick

Hello Ryan,

I am extremely surprised right now haha.

I was finally able to get all the reads from my small/inefficient Nanopore run through dorado-CPU. I mapped them to the mitochondrial genome of C. elegans, and then I recovered the mapped reads. I tried the mapping/assembly/polishing twice, once, with all the reads, and then with reads with max-length of 15k. I used this max because the genome of C elegans is 13.7 kb.

I would appreciate if you could give me your comments about my approaches. I am surprised because the mitochondrial assembly with all the reads (max 44kb), gave me a circular mitochondrial genome (yay! you can imagine how happy I am!) of 10,043 bp and depth of 163x; the one with reads up to 15kb, I got a circular mitochondrial genome (YAY!!!), BUT SIGNIFICANTLY LONGER - 15,959 bp and Depth of 102.5x. image.

Could you find a reason why using longer reads gave such a shorter circular graph? I would expect that the reads in both cases would support or pinpoint to a mitochondrial genome of size Z; if the reads are weird, I imagine so hard trimming is happening, and edges with low support would be cut, but I don't see why it would end in a shorter graph. image

The code I used goes like this: minimap2 -x ava-ont -t 12 Ju760.Lig.P-1.mapped.max15k.fastq Ju760.Lig.P-1.mapped.max15k.fastq | gzip -1 > Ju760.Jul25.max15k.paf.gz miniasm -f Ju760.Lig.P-1.mapped.max15k.fastq **./**Ju760.Jul25.max15k.paf.gz > Ju760.Jul25.max15k.gfa minipolish -t 12 Ju760.Lig.P-1.mapped.max15k.fastq Ju760.Jul25.max15k.gfa > Ju760.Jul25.max15k.polish.gfa Does it matter the "./" in miniasm? Without it, It doesn't seem to work.

Looking forward to your comments and suggestions to help me confirm that the 15kb mitochondrial genome I obtained is reliable and I can trust it.

Thank you very much;

Juan Pablo

desmodus1984 avatar Jul 25 '24 23:07 desmodus1984

That's a tough one, and I don't know why you're getting the discrepancy. It will probably take some experimentation and detective work!

Some miscellaneous thoughts:

  • You're using a C. elegans mito genome to select your reads. Is that a reliable reference? If the C. elegans mito genome is too different from yours, you might be missing some reads.
  • Try other assemblers, e.g. Canu and Flye on your read sets. See if they tend to produce the 10 kbp version or the 15 kbp version. Can then use Trycycler to build a clean consensus from your alternative assemblies.
  • Long-read assemblers often struggle with small replicons like this. Assuming that your mito genome isn't full of repeats, you might paradoxically get a better assembly by chopping up your ONT reads into smaller pieces (e.g. 2 kbp) and then assembling.
  • For each of your assemblies, try aligning your ONT reads and then visualising the alignments in IGV. Sometimes this can help you spot oddities.

rrwick avatar Jul 26 '24 01:07 rrwick

Thanks.

Do you know if the polished assembly graph can't be polished a second round? I got all the read from the other sequencing attemps, and I got some more reads, and I got a bigger asembly, 15976, when the one I got first was 15959. Unless the algorithm is deterministic and not stochastic/random, I should get the same result. Thus, I got the polish assembly graph: Ju760.Jul28.polish.min500.gfa, and tried again to repolish it:

minipolish -t 12 Ju760.Jul28.mapped.min100.max15.fastq Ju760.Jul28.polish.min500.gfa > Ju760.Jul28.polish.min500.1.gfa

and I got an error: Checking requirements Minipolish requires Minimap2 and Racon to run, so it checks for these tools now.

Minimap2 found: /home/juaguila/miniconda3/envs/pomoxis/bin/minimap2 (v2.22-r1101) Racon found: /home/juaguila/miniconda3/envs/pomoxis/bin/racon (v1.4.20)

Loading graph Loading the miniasm GFA graph into memory.

Ju760.Jul28.polish.min500.gfa 1 segments (15,976 bp) 2 links

Initial polishing round The first round of polishing is done on a per-segment basis and only uses reads which are definitely associated with the segment (because the GFA indicated that they were used to make the segment).

Error: could not find /tmp/tmpabf382_o/utg000001c_reads.fastq

Could you tell why this is happening?

Thanks;

desmodus1984 avatar Jul 28 '24 21:07 desmodus1984

It seems that Minipolish is having issues getting reads from its temporary file. Maybe you don't have read/write permissions for /tmp or something like that? You could try setting your TMPDIR environment variable to somewhere else (where you definitely have read/write permissions).

If that doesn't help, can you upload the reads here or email them to me and I'll see if I can reproduce the problem?

rrwick avatar Jul 29 '24 00:07 rrwick

I am seeing a (I think) similar problem on an E. faecium assembly

I ran it using the autocycler helper

autocycler helper miniasm --reads subsampled_reads/sample_04.fastq --out_prefix assemblies/miniasm_04 --threads 16 --genome_size 2700272 --read_type ont_r10 --min_depth_rel 0.1

Here is the end of the stderr

Checking requirements
    Minipolish requires Minimap2 and Racon to run, so it checks for these tools
now.

Minimap2 found: /home/uqmhal11/sw/miniforge3/envs/autocycler/bin/minimap2 (v2.28-r1209)
Racon found:    /home/uqmhal11/sw/miniforge3/envs/autocycler/bin/racon (v1.5.0)


Loading graph
    Loading the miniasm GFA graph into memory.

/scratch/temp/17804485/.tmp6rLxBV/unpolished.gfa
  6 segments (3,266,504 bp)
  6 links


Initial polishing round
    The first round of polishing is done on a per-segment basis and only uses
reads which are definitely associated with the segment (because the GFA
indicated that they were used to make the segment).

Running Racon on utg000001l:
  reads:      /scratch/temp/17804485/tmpcc55tjvw/utg000001l_reads.fastq (344 reads)
  input:      /scratch/temp/17804485/tmpcc55tjvw/utg000001l.fasta (939,027 bp)
  alignments: /scratch/temp/17804485/tmpcc55tjvw/utg000001l.paf (351 alignments)
  output:     /scratch/temp/17804485/tmpcc55tjvw/utg000001l_polished.fasta (0 bp)

Removing empty segment: utg000001l

Running Racon on utg000002l:
  reads:      /scratch/temp/17804485/tmpcc55tjvw/utg000002l_reads.fastq (699 reads)
  input:      /scratch/temp/17804485/tmpcc55tjvw/utg000002l.fasta (1,902,938 bp)
  alignments: /scratch/temp/17804485/tmpcc55tjvw/utg000002l.paf (730 alignments)
  output:     /scratch/temp/17804485/tmpcc55tjvw/utg000002l_polished.fasta (0 bp)

Removing empty segment: utg000002l

Running Racon on utg000003l:
  reads:      /scratch/temp/17804485/tmpcc55tjvw/utg000003l_reads.fastq (94 reads)
  input:      /scratch/temp/17804485/tmpcc55tjvw/utg000003l.fasta (249,424 bp)
  alignments: /scratch/temp/17804485/tmpcc55tjvw/utg000003l.paf (102 alignments)
  output:     /scratch/temp/17804485/tmpcc55tjvw/utg000003l_polished.fasta (0 bp)

Removing empty segment: utg000003l

Running Racon on utg000004l:
  reads:      /scratch/temp/17804485/tmpcc55tjvw/utg000004l_reads.fastq (46 reads)
  input:      /scratch/temp/17804485/tmpcc55tjvw/utg000004l.fasta (124,910 bp)
  alignments: /scratch/temp/17804485/tmpcc55tjvw/utg000004l.paf (46 alignments)
  output:     /scratch/temp/17804485/tmpcc55tjvw/utg000004l_polished.fasta (0 bp)

Removing empty segment: utg000004l

Running Racon on utg000005l:
  reads:      /scratch/temp/17804485/tmpcc55tjvw/utg000005l_reads.fastq (13 reads)
  input:      /scratch/temp/17804485/tmpcc55tjvw/utg000005l.fasta (44,105 bp)
  alignments: /scratch/temp/17804485/tmpcc55tjvw/utg000005l.paf (13 alignments)
  output:     /scratch/temp/17804485/tmpcc55tjvw/utg000005l_polished.fasta (0 bp)

Removing empty segment: utg000005l

Running Racon on utg000006c:
  reads:      /scratch/temp/17804485/tmpcc55tjvw/utg000006c_reads.fastq (2 reads)
  input:      /scratch/temp/17804485/tmpcc55tjvw/utg000006c.fasta (6,100 bp)
  alignments: /scratch/temp/17804485/tmpcc55tjvw/utg000006c.paf (7 alignments)
  output:     /scratch/temp/17804485/tmpcc55tjvw/utg000006c_polished.fasta (0 bp)

Removing empty segment: utg000006c



Full polishing rounds
    The assembly graph is now polished using all of the reads. Multiple rounds
of polishing are done, and circular contigs are rotated between rounds.


Running Racon on round_1:
  reads:      subsampled_reads/sample_04.fastq (33,696 reads)
  input:      /scratch/temp/17804485/tmpcc55tjvw/round_1.fasta (0 bp)
  alignments: /scratch/temp/17804485/tmpcc55tjvw/round_1.paf (0 alignments)
  output:     /scratch/temp/17804485/tmpcc55tjvw/round_1_polished.fasta (0 bp)


Running Racon on round_2:
  reads:      subsampled_reads/sample_04.fastq (33,696 reads)
  input:      /scratch/temp/17804485/tmpcc55tjvw/round_2.fasta (0 bp)
  alignments: /scratch/temp/17804485/tmpcc55tjvw/round_2.paf (0 alignments)
  output:     /scratch/temp/17804485/tmpcc55tjvw/round_2_polished.fasta (0 bp)


Assign read depths
    The reads are aligned to the contigs one final time to calculate read depth
values.

Aligning reads:
  reads:      subsampled_reads/sample_04.fastq (33,696 reads)
  contigs:    /scratch/temp/17804485/tmpcc55tjvw/depths.fasta (0 bp)
  alignments: /scratch/temp/17804485/tmpcc55tjvw/depths.paf (0 alignments)
  mean depth: 0.000x

I have noticed this happens semi-regularly on a collection of 22 samples I have been assembling lately. What I get at the end is an empty GFA file and, of course, no assembly.

mbhall88 avatar Oct 28 '25 23:10 mbhall88