FALCON icon indicating copy to clipboard operation
FALCON copied to clipboard

Meaning of associated contig identifiers

Open danshu opened this issue 8 years ago • 4 comments

Hi,

Here is the identifier of an associated contig: 000000F-016-01 My understanding is that: 000000F is the identifier of the corresponding primary contig; 016 means that this associated contig corresponds to the 16th bubbles on the primary contig; 01 means that this is the alternative path (00 instead for contigs in a_ctg_base). Am I correct?

If I'm correct, for the following two identifiers:

001529F-001-01 000521236:E 002486891:E 11979 15764 2 -20 0.87 0.91 001529F-001-02 000521236:E 002486891:E 11915 12996 2 -84 0.96 1.00 Does these two associated contigs correspond to the same bubble, which means that this bubble has two alternative paths?

And also for some contigs in a_ctg_base.fa, I can not find the corresponding alternative path in a_ctg.fa. For example, I have "002351F-001-00" in a_ctg_base.fa, but I can not find any identifiers starting with "002351F" in a_ctg.fa. How can I interpret this? Thanks in advance for any help!

Best, Quan

danshu avatar Oct 25 '16 03:10 danshu

And also for some contigs in a_ctg_base.fa, I can not find the corresponding alternative path in a_ctg.fa. For example, I have "002351F-001-00" in a_ctg_base.fa, but I can not find any identifiers starting with "002351F" in a_ctg.fa. How can I interpret this? Thanks in advance for any help!

It means the the a-contig 002351F-001-* is almost identical to 002351F-001-00. What might happen is there is some missing overlaps causing the bubbles. The generated a-ctg is likely to be duplication and get "de-dupped".

pb-jchin avatar Oct 25 '16 16:10 pb-jchin

Thanks for your explanation! @pb-jchin So for the first question, do associated contigs with -02 suffix such as "001529F-001-02" correspond to second alternative paths of that bubble?

danshu avatar Oct 26 '16 11:10 danshu

So for the first question, do associated contigs with -02 suffix such as "001529F-001-02" correspond to second alternative paths of that bubble?

yes.

pb-jchin avatar Oct 26 '16 14:10 pb-jchin

For bubbles having two alternative paths, then there must be assembly errors (e.g. caused by segmental duplication?) for a diploid genome? How do falcon and falcon_unzip deal with these bubbles?

danshu avatar Oct 29 '16 03:10 danshu