uta
uta copied to clipboard
uta_20150827 is missing ENSP accessions and sequences
Originally reported by Reece Hart (Bitbucket: reece, GitHub: reece) in biocommons/uta #194 Migrated by bitbucket-issue-migration on 2016-09-09 15:15:07
uta_20150827 does not include ENSP sequences or seqinfo. One consequence of this is that c_to_p transformations in hgvs result in MD5 accessions.
This issue should update uta with ENSP sequences and accessions (from release-79).
FWIW, this occurs because it was discovered that Ensembl sequence accessions are non-unique, as provided via fasta files on their web site. That is, a single accession may be associated with more than one sequence. Roughly 10,000 instances of ambiguous ENSPs exist between e-71 and e-81.
(It's likely that these ambiguities are distinguished by stable_id versions internally, but these distinctions are not exposed in the fasta files.)