FALCON icon indicating copy to clipboard operation
FALCON copied to clipboard

consensus-calling with arrow for contigs absent from all_p_ctg.fa but present in p_ctg.fa after falcon_unzip

Open sjin09 opened this issue 7 years ago • 1 comments

I have been successfully able to run FALCON (PacificBiosciences/FALCON#514) for the human genome and I am now performing FALCON_UNZIP. FALCON_UNZIP has also been successful, but there were some contigs absent as a result of the graph being circular and returns an empty path #20.

Here, are the assembly statistics for p_ctg.fa.

number_of_contigs: 3,904
contig_N50: 24,379,051 bp
minimum_contig_length: 17 bp
maximum_contig_length: 109,706,220 bp
assembly length: 2,892,837,735

The assembly statistics for all_pctg.fa

number_of_contigs: 2,253
contig_N50: 24,379,667 bp
minimum_contig_length: 3,540 bp
maximum_contig_length: 109,710,721 bp
assembly length: 2,857,052,564 bp

I would like to be able to incorporate some of the circular contigs for consensus-calling using arrow. I would love to hear some recommendations for this case. I thought it would be ideal to be able to use the 2-asm-falcon/read_maps/read_to_contig_map to select out the reads that mapped to the circular contigs and using the reads perform arrow just for these circular contigs.

In addition, I wanted to also inquire about contigs that are completely absent from all_p_ctg.fa but present in the p_ctg.fa. Would it be correct to assume that they have all been incorporated into all_h_ctg.fa? If not, what is the filtering mechanism?

I have also found many of these contigs that were absent or empty contained centromeric sequences.

Best, Jin

sjin09 avatar Mar 31 '17 00:03 sjin09

I have also been able to observe a number of contigs that have significant changes to their sequences. I have uploaded a dotplot illustrating the example. The horizontal sequence is derived from FALCON while the vertical sequence is derived from FALCON_UNZIP.

000479f

In such cases, do you have recommendations for diagnosing the changes in the sequence, determining why the sequence has been changed and if the sequence change has been erroneous?

I assume that some of the changes are from haplotype differences, but I also observe a number of haplotigs and its respective pair without any significant matches.

Best, Jin

sjin09 avatar Mar 31 '17 00:03 sjin09