stringdecomposer
stringdecomposer copied to clipboard
Question about +/- in final_decomposition.tsv
In the study of Altemose et al. (Complete genomic and epigenetic maps of human centromeres), CHM13 Cen1 contains 1.7Mb inversion inside active α HOR array (Fig 2a). We used Stringdecomposer in Cen1 active α HOR array. However, all items in final_decomposition.tsv are +. Can stringdecomposer mark + / - for sequence?
Hi,
Thank you for your interest in StringDecomposer! In our tsv-files +/- at the end of each row refer to "reliability" of alignment (see more info about output in Quick start section). This characteristic is needed for monomer-to-read alignment only.
The strand is represented as ' at the end of the monomer name. Consider two rows in final tsv-file: ref mon 1 171 99 ref mon' 172 343 99
Second row shows that monomer mon is aligned with identity 99 in reverse strand.
We understand that such representation of strand is a bit misleading and we are going to add bed-file representation of StringDecomposer output in the nearest release. For now you can use our internal script to convert StringDecomposer final tsv-file to bed-file convert2bed.py.
If this won't help, please don't hesitate to ask further questions!
Best, Tanya
Thank you for your response!