FALCON
FALCON copied to clipboard
consensus-calling with arrow for contigs absent from all_p_ctg.fa but present in p_ctg.fa after falcon_unzip
I have been successfully able to run FALCON (PacificBiosciences/FALCON#514) for the human genome and I am now performing FALCON_UNZIP. FALCON_UNZIP has also been successful, but there were some contigs absent as a result of the graph being circular and returns an empty path #20.
Here, are the assembly statistics for p_ctg.fa.
number_of_contigs: 3,904
contig_N50: 24,379,051 bp
minimum_contig_length: 17 bp
maximum_contig_length: 109,706,220 bp
assembly length: 2,892,837,735
The assembly statistics for all_pctg.fa
number_of_contigs: 2,253
contig_N50: 24,379,667 bp
minimum_contig_length: 3,540 bp
maximum_contig_length: 109,710,721 bp
assembly length: 2,857,052,564 bp
I would like to be able to incorporate some of the circular contigs for consensus-calling using arrow. I would love to hear some recommendations for this case. I thought it would be ideal to be able to use the 2-asm-falcon/read_maps/read_to_contig_map to select out the reads that mapped to the circular contigs and using the reads perform arrow just for these circular contigs.
In addition, I wanted to also inquire about contigs that are completely absent from all_p_ctg.fa but present in the p_ctg.fa. Would it be correct to assume that they have all been incorporated into all_h_ctg.fa? If not, what is the filtering mechanism?
I have also found many of these contigs that were absent or empty contained centromeric sequences.
Best, Jin
I have also been able to observe a number of contigs that have significant changes to their sequences. I have uploaded a dotplot illustrating the example. The horizontal sequence is derived from FALCON while the vertical sequence is derived from FALCON_UNZIP.
In such cases, do you have recommendations for diagnosing the changes in the sequence, determining why the sequence has been changed and if the sequence change has been erroneous?
I assume that some of the changes are from haplotype differences, but I also observe a number of haplotigs and its respective pair without any significant matches.
Best, Jin