Flye
Flye copied to clipboard
HiFi and high quality ultra-long ONT
Hello sir,
Long time fan and user of Flye, Abruijn, etc.
I have three datasets:
- PacBio HiFi
- 90-95% accurate ultra-long ONT
- Hi-C
I know there are other assemblers that may out-perform Flye with these datasets, but I am having trouble with them:
- I have run Verkko on a subset of the data, and got it to finish, but am having trouble with it on all the data.
- I also am trying to get HiFiasm to finish on even a subset of the data, but due to time limit (and memory) constraints, I am facing troubles there too.
Thus, I would like to see what Flye can do here. Perhaps we will use the Flye assembly as is, but I am also wondering if it could be used as a data compression step. For example, I could use the Flye assembly in combination with a smaller subset of data with one of the assemblers above. Just riffing here - I know others would poo-poo such an idea.
So, my question is: Do you have a recommended pipeline to make use of all three datasets, or at least the first two?
I know the FAQs answers a related version of this question, but for older data types:
Can I use both PacBio and ONT reads for assembly?
You can do this as follows: first, run the pipeline with all your reads in the --pacbio-raw mode (you can specify multiple files, no need to merge all you reads into one). Also add --iterations 0 to stop the pipeline before polishing.
Once the assembly finishes, run polishing using either PacBio or ONT reads only. Use the same assembly options, but add --resume-from polishing. Here is an example of a script that should do the job (thanks to @jvhaarst):
flye --pacbio-raw $PBREADS $ONTREADS --iterations 0 --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
flye --pacbio-raw $PBREADS --resume-from polishing --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
Would it be recommended to do swap out the --pacbio-raw flag for --pacbio-hifi ? ::
flye --pacbio-hifi $PBREADS $ONTREADS --iterations 0 --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
flye --pacbio-hifi $PBREADS --resume-from polishing --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
Or maybe treat both as --nanopore-hq ? followed by --pacbio-hifi polishing::
flye --nano-hq $PBREADS $ONTREADS --iterations 0 --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
flye --pacbio-hifi $PBREADS --resume-from polishing --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
Or perhaps even some type of 3 or 4 step procedure, using intermediate assemblies as part of the input for the final assembly:
# hifi asm
flye --pacbio-hifi $PBREADS --iterations 0 --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
# nano hq asm
flye --nano-hq $PBREADS $ONTREADS --iterations 0 --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
# combined asm (either -pacbio-hifi or --nano-hq flag)
flye --pacbio-hifi $PBREADS $ONTREADS $HIFIASM $NANOASM --resume-from polishing --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
#polishing
flye --pacbio-hifi $PBREADS --resume-from polishing --out-dir $OUTPUTDIR --genome-size $SIZE --threads $THREADS
Any thoughts would be appreciated.
Best,
John
p.s. I suppose the nanopore reads could be corrected with Herro as a possibility too.
p.p.s. As for the Hi-C data, I know Flye doesn't take it directly. Do you recommend a particular Hi-C scaffolder for Flye assemblies, and are there clean-up steps/etc recommended prior to using it?