FALCON icon indicating copy to clipboard operation
FALCON copied to clipboard

Hybrid Assembly using falcon

Open prince26121991 opened this issue 9 years ago • 8 comments

I want to ask if I have 20X illumina corrected pacbio reads, Should I use falcon for overlap graph construction? Because https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Large-Genome-Assembly-with-PacBio-Long-Reads says that it is for PB Only reads. And can you provide a resource or publication for FALCON where I can understand what it does in background https://github.com/PacificBiosciences/FALCON/wiki/Manual This page focus on technical detail rather than theoretical detail.

prince26121991 avatar Feb 17 '16 13:02 prince26121991

Hi, @prince26121991, the general principle of FALCON design follows this publication http://www.nature.com/nmeth/journal/v10/n6/full/nmeth.2474.html You can also check https://speakerdeck.com/jchin/string-graph-assembly-for-diploid-genomes-with-long-reads and https://speakerdeck.com/jchin/de-novo-diploid-genome-assembly-and-haplotype-sequence-reconstruction (FALCON itself does not separate the haplotypes though)

pb-jchin avatar Feb 17 '16 14:02 pb-jchin

I already had seen these publications sir, but earlier when I worked with HGAP I had 20X coverage for Raw reads and at Celera step I had 8X coverage remained after correction which was not sufficient for Celera later on I included illumina reads and increased overall coverage of PB raw reads also, so Now I have 70X short reads to correct 42X PB reads which became 20X after hybrid error correction Now I want to ask If I can use Falcon on those 20X corrected pacbio reads.....

prince26121991 avatar Feb 17 '16 17:02 prince26121991

With 42x PB coverage, that's good enough for at least a decent Falcon assembly alone. I'd set your length_cutoff equal to the length above which you have 30x coverage and just do a PacBio only calculation.

On Wed, Feb 17, 2016 at 9:06 AM, prince26121991 [email protected] wrote:

I already had seen these publications sir, but earlier when I worked with HGAP I had 20X coverage for Raw reads and at Celera step I had 8X coverage remained after correction which was not sufficient for Celera later on I included illumina reads and increased overall coverage of PB raw reads also, so Now I have 70X short reads to correct 42X PB reads which became 20X after hybrid error correction Now I want to ask If I can use Falcon on those 20X corrected pacbio reads.....

— Reply to this email directly or view it on GitHub https://github.com/PacificBiosciences/FALCON/issues/282#issuecomment-185301107 .

mseetin avatar Feb 17 '16 17:02 mseetin

@prince26121991 FALCON is not designed to handle hybrid corrected reads. While it works for some error corrected reads by setting the input type as "preads", you need to be careful. The later stage of FALCON has no explicit chimer or artifact removal mechanism. (In the FALCON design, we push all those machinery into the earlier error correction stage.) It works if the error corrected is "correct". We can not be 100% sure what kind of artifacts in the hybrid corrected reads. So, you can try, but it is hard to say what might happen. To some degree, Calera Assembler (or Canu) will handle the hybrid reads better as there is still some artifact removal stage there.

pb-jchin avatar Feb 17 '16 18:02 pb-jchin

@pb-jchin Do you agree with @mseetin ? 42X is enough for a decent falcon assembly, It's a diploid genome and Highly repetitive...

prince26121991 avatar Mar 11 '16 06:03 prince26121991

Whether 42x is enough will depend on the raw accuracy and the genome-size. It's worth trying.

With the latest FALCON, you can set:

  • genome_size= (your best guess),
  • seed_coverage=30 or =42 or whatever you want, and
  • length_cutoff=-1

The length_cutoff will then be calculated for you at runtime.

This feature is not yet documented, but we use it within PacBio regularly.

pb-cdunn avatar Mar 11 '16 17:03 pb-cdunn

Thank you for the additional information about the settings. In this case, should one also use length_cutoff_pr = -1 ?

ademcan avatar Sep 29 '16 07:09 ademcan

No. We don't auto-calculate length_cutoff_pr yet, but you can choose something conservative, at the short end.

pb-cdunn avatar Sep 29 '16 14:09 pb-cdunn