uta
uta copied to clipboard
project transcript CDS bounds onto aligned sequences
Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #197 Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07
Currently, the only way to get the CDS start,end onto aligned sequences is to project them explicity with the hgvs package. That's annoying, and precludes rapid querying for CDS variants in a genomic context.
In the vast majority of cases, projecting the CDS onto aligned sequences is trivial. However, in the presence of indels in the CDS start or end exon, the calculation really requires an alignment (e.g., CIGAR string) to locate the location precisely. So, the reason these aren't in UTA now is that the alignment code is in hgvs, which means that there's a circular dependency: hgvs needs a released UTA to project variants, but UTA (now) needs the alignment code in hgvs to be released.
The solution is to move the mapping code to a separate package used by both tools.