Flye icon indicating copy to clipboard operation
Flye copied to clipboard

Cannot resolve link between two chromosomes- any other parameters I can adjust?

Open zacksaud opened this issue 2 years ago • 2 comments

Hi,

I have sequenced a haploid fungal genome (around 35mb total size, 7 chromosomes, 1 mitochondria) using both Nanopore (R10.4.1 Kit 14 ligation) and Illumina (150bp). I have removed middle adapters using split_on_adapter of duplex tools, ran porechop, corrected long reads with short using FMLRC2 and then used Canu trim only before assembly with Flye. The nearest whole genome I have got has been running: Flye --nano-corr Reads.fasta -i 3 --scaffold --trestle -m 6000 --read-err 0.0015 --no-alt-contigs -o Output Despite this, the assembly graph shows a shared region of 47,705 bp between two chromosomes that cannot be resolved: Screenshot from 2022-10-06 09-35-49

I have tried all possible alterations of -m and --read-err, but these two chromosomes were still not resolved. Are there any other parameters I could try changing in Flye that may help resolve this (or any tools you could suggest)?

Many thanks it advance

Zack

zacksaud avatar Oct 06 '22 08:10 zacksaud

Also worth noting, the connecting piece has much less coverage in the assembly graph than the other regions: Screenshot from 2022-10-06 10-04-05

I have tried mapping all reads to that 47,705 bp piece and assembling with that region removed, but it just generates 4 contigs out of the 2 chromosomes.

I can give you any additional information you may need to kindly help me with my problem, and all help will be greatly appreciated.

Best

Zack

zacksaud avatar Oct 06 '22 09:10 zacksaud

Hi Zack,

It looks like there may be some piece of sequence shared by chromosome on their ends. This is causing chromosomes to be merged on the graph. There are two possibilities: if this is on the very end of both chromosomes, then there will be no repeats that can span and separate repeat copies. Or this might be do to the length (if it is indeed ~47kb) that there are no reads that span it.

Contig sequences typically should extend into repeats, I would check if the corresponding contigs contain telomeric sequence. If yes, your assembly is already complete. If not, then it is likely that the repeat is longer than read length, and that limits the assembly contiguity.

I would also recommend running using --nano-hq without error correction. This is how we normally run and test Flye.

Best, Mikhail

mikolmogorov avatar Oct 15 '22 15:10 mikolmogorov