FALCON icon indicating copy to clipboard operation
FALCON copied to clipboard

Information in p_ctg.fa and a_ctg.fa headers

Open TransGirlCodes opened this issue 8 years ago • 3 comments

Hi,

I've been given some files by a colleague that are contigs from a FALCON genome assembly. p_ctg.fa has primary contigs with headers like so:

>000002F 000112876:B~000041082:B~000091592:B~000073090:E ctg_linear 1246645 3062142

and a_ctg.fa are the associated contigs with headers like so:

>000002F-002-01 000053413:B 000112925:B 25314 39510 6 -15560 0.97 0.70 >000002F-003-01 000072527:E 000123366:E 21446 12965 2 -7309 0.99 0.75 >000002F-006-01 000022616:E 000142565:B 53806 92899 16 -26874 0.99 0.27 >000002F-007-01 000004166:E 000099528:E 25403 39631 5 3835 0.98 0.53

What I would like to do, is for each associated contig, align it to its location on its primary contig, and then do a window based scan of sequence divergence between them.

I wondered if there is information in the associated contig headers that tell me where on the primary contig they align, like a beginning base position or end base position?

Thanks!

TransGirlCodes avatar Jun 02 '16 15:06 TransGirlCodes

Hi, @Ward9250 if you look for file a_ctg_base.fa it contains the sequences in the primary contigs that has alternative mapped to. We did not put the coordinates in the output although they can be compute from the p_ctg_tiling_path and a_ctg_base_tiling_path.

The information in the header actually shows the some of the alignment information between the associated contig to the primary contig. The last three field are (1) length difference, (2) alignment identity and (3) alignment coverage. For example, -15560 0.97 0.70 indicates that the length different is 15560 base, the alignment part identity is 0.97 and the aligned part is only 70% of the contig.

pb-jchin avatar Jun 02 '16 21:06 pb-jchin

@pb-jchin That makes sense thanks! I suppose that since an a_ctg and an a_base_ctg can have different lengths, I should align the two together before doing anything like a sliding window.

TransGirlCodes avatar Jun 03 '16 14:06 TransGirlCodes

if there are different lengths, in some cases, the deletion or insertion variants can be at the beginning (or the ending) of an a-ctg.

pb-jchin avatar Jun 03 '16 17:06 pb-jchin