FALCON icon indicating copy to clipboard operation
FALCON copied to clipboard

contig names usually end with "F" or "R". What about the others?

Open dgordon562 opened this issue 9 years ago • 3 comments

What is the significance of the contigs whose names do NOT end in F or R?

dgordon562 avatar Sep 15 '16 23:09 dgordon562

The file ctg_paths encodes the graph for each contig after the unitigs are analyzed and put into 
contigs. Each line has 7 columns. The first column is the contig ID. The contig ID are just the serial
 numbers followed by R or F. Two contigs with same serial number but different endings are "dual" to
 each other. Namely, they are constructed from "dual" edges and they are mostly reverse
 complemented to each other except near the ends of the contigs. The second column is the type of
 contig. If a unitig is circular (the beginning node and the ending node are the same), then it will be
 marked as "ctg_circular". Everything else will be "ctg_linear". In some case, even a contig is marked
 as "ctg_linear", it can be still a circular contig if the beginning node and the ending node are the
 same but it is not a "simple" path. One can detect that by checking the beginning and ending
 nodes if necessary.

cf. https://github.com/PacificBiosciences/FALCON/wiki/Manual

pb-jchin avatar Sep 16 '16 13:09 pb-jchin

Does that answer the question? I understand F and R, but not their absence. Which file has contig names without F/R?

pb-cdunn avatar Sep 26 '16 16:09 pb-cdunn

Hi, Chris,

Sorry for the long delay. I'm guessing (although I admit it isn't clear) that contigs without F or R are considered circular by falcon. I see these in the names of contigs in 2-asm-falcon/p_ctg.fa What do you think?

David

dgordon562 avatar Oct 05 '16 00:10 dgordon562