TideHunter subPos & match score feature request

subPos & match score feature request

Open zztin opened this issue 4 years ago • 9 comments

Hi Gao, I tried to retrieve the repeated subunits from the long read and feed it into other consensus calling methods (such as Medaka by ONT or majority voting).

According to the README: subPos: start coordinates of all the tandem repeat unit sequence, followed by the end coordinate of the last tandem repeat unit sequence, separated by ",", all coordinates are 1-based.

Problems I faced:

When 5' and 3' primers are given, the subPos is the start of the tandem repeat sequence, not the start of the targeted sequence. However, the length is the targeted sequence length. The tandem repeat length is not reported.

Is it possible to report the start location at the position where the target sequence starts instead of the whole tandem repeat?
Is it possible to include the (start, end) position of each sub-unit? Or to have an option to export all the repeat subunits in a fastq file (with identifiable read name such as >readname_consX_repY).

In some reads, multiple consensus sequences of different lengths are reported with (completely) overlaying regions. Is it possible to include a column to report the overall alignment score of the subunits?

I see there is a criterion to filter by maximum divergence rate between two consecutive repeats, but this does not necessarily report the quality of the overall consensus. Is this a correct intepretation? Is there a possibility to add a score to report the divergence rate of all repeats to the consensus sequence?

Thank you very much!!

May 13 '20 15:05 zztin

TideHunter TideHunter copied to clipboard

subPos & match score feature request

TideHunter
TideHunter copied to clipboard