Flye icon indicating copy to clipboard operation
Flye copied to clipboard

HiFi and high quality ultra-long ONT

Open JohnUrban opened this issue 4 months ago • 7 comments

Hello sir,

Long time fan and user of Flye, Abruijn, etc.

I have three datasets:

  • PacBio HiFi
  • 90-95% accurate ultra-long ONT
  • Hi-C

I know there are other assemblers that may out-perform Flye with these datasets, but I am having trouble with them:

  • I have run Verkko on a subset of the data, and got it to finish, but am having trouble with it on all the data.
  • I also am trying to get HiFiasm to finish on even a subset of the data, but due to time limit (and memory) constraints, I am facing troubles there too.

Thus, I would like to see what Flye can do here. Perhaps we will use the Flye assembly as is, but I am also wondering if it could be used as a data compression step. For example, I could use the Flye assembly in combination with a smaller subset of data with one of the assemblers above. Just riffing here - I know others would poo-poo such an idea.

So, my question is: Do you have a recommended pipeline to make use of all three datasets, or at least the first two?

I know the FAQs answers a related version of this question, but for older data types:

Can I use both PacBio and ONT reads for assembly?
You can do this as follows: first, run the pipeline with all your reads in the --pacbio-raw mode (you can specify multiple files, no need to merge all you reads into one). Also add --iterations 0 to stop the pipeline before polishing.

Once the assembly finishes, run polishing using either PacBio or ONT reads only. Use the same assembly options, but add --resume-from polishing. Here is an example of a script that should do the job (thanks to @jvhaarst):

flye --pacbio-raw $PBREADS $ONTREADS --iterations 0 --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
flye --pacbio-raw $PBREADS --resume-from polishing --out-dir $OUTPUTDIR  --genome-size $SIZE --threads $THREADS

Would it be recommended to do swap out the --pacbio-raw flag for --pacbio-hifi ? ::

flye --pacbio-hifi $PBREADS $ONTREADS --iterations 0 --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
flye --pacbio-hifi $PBREADS --resume-from polishing --out-dir $OUTPUTDIR  --genome-size $SIZE --threads $THREADS

Or maybe treat both as --nanopore-hq ? followed by --pacbio-hifi polishing::

flye --nano-hq $PBREADS $ONTREADS --iterations 0 --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
flye --pacbio-hifi $PBREADS --resume-from polishing --out-dir $OUTPUTDIR  --genome-size $SIZE --threads $THREADS

Or perhaps even some type of 3 or 4 step procedure, using intermediate assemblies as part of the input for the final assembly:

# hifi asm
flye --pacbio-hifi $PBREADS  --iterations 0 --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS

# nano hq asm
flye --nano-hq $PBREADS $ONTREADS --iterations 0 --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS

# combined asm (either -pacbio-hifi or --nano-hq flag)
flye --pacbio-hifi $PBREADS $ONTREADS $HIFIASM $NANOASM --resume-from polishing --out-dir $OUTPUTDIR  --genome-size $SIZE --threads $THREADS

#polishing
flye --pacbio-hifi $PBREADS --resume-from polishing --out-dir $OUTPUTDIR  --genome-size $SIZE --threads $THREADS

Any thoughts would be appreciated.

Best,

John

p.s. I suppose the nanopore reads could be corrected with Herro as a possibility too.

p.p.s. As for the Hi-C data, I know Flye doesn't take it directly. Do you recommend a particular Hi-C scaffolder for Flye assemblies, and are there clean-up steps/etc recommended prior to using it?

JohnUrban avatar Oct 14 '24 15:10 JohnUrban