stringdecomposer icon indicating copy to clipboard operation
stringdecomposer copied to clipboard

Question about +/- in final_decomposition.tsv

Open 865699871 opened this issue 2 years ago • 2 comments

In the study of Altemose et al. (Complete genomic and epigenetic maps of human centromeres), CHM13 Cen1 contains 1.7Mb inversion inside active α HOR array (Fig 2a). We used Stringdecomposer in Cen1 active α HOR array. However, all items in final_decomposition.tsv are +. Can stringdecomposer mark + / - for sequence?

865699871 avatar Mar 21 '22 11:03 865699871

Hi,

Thank you for your interest in StringDecomposer! In our tsv-files +/- at the end of each row refer to "reliability" of alignment (see more info about output in Quick start section). This characteristic is needed for monomer-to-read alignment only.

The strand is represented as ' at the end of the monomer name. Consider two rows in final tsv-file: ref mon 1 171 99 ref mon' 172 343 99

Second row shows that monomer mon is aligned with identity 99 in reverse strand.

We understand that such representation of strand is a bit misleading and we are going to add bed-file representation of StringDecomposer output in the nearest release. For now you can use our internal script to convert StringDecomposer final tsv-file to bed-file convert2bed.py.

If this won't help, please don't hesitate to ask further questions!

Best, Tanya

TanyaDvorkina avatar Mar 21 '22 13:03 TanyaDvorkina

Thank you for your response!

865699871 avatar Mar 21 '22 13:03 865699871