ExpansionHunter icon indicating copy to clipboard operation
ExpansionHunter copied to clipboard

Details on repeat unit of interrupted and alternate alleles

Open dnil opened this issue 5 years ago • 3 comments

The two most common issues we have right now pertain to annotating the repeat units. In particular A) the precise repeat unit (for RFC1) present in a repeat that matches and B) interrupted- versus non-interrupted allele expansions as in ATXN1.

The latter (B) is perhaps to some extent covered by repeat purity, and may be solved by exposing/using it, but automatically recovering the longest uninterrupted pure sub-stretch would be useful.

It would be helpful for screening to be able to see already from the VCF if the discovered RFC1 alleles were AAGGG or AAAAG - or one of the other slightly more rare versions - and the zygosity to tell if the expanded locus was homozygous normal - or pathologic.

dnil avatar Dec 12 '19 13:12 dnil

Thanks for the suggestion! We will work on enabling EH to annotate motif changes. Do you have any samples that have such mutations?

egor-dolzhenko avatar Dec 16 '19 15:12 egor-dolzhenko

Sorry, I missed the reply! Yes, we do have for RFC1 - but I kind of think you do as well, right? 😸 From the top of my head at least for interrupted ATXN1, but that does seem to be the normal case, so I'm sure you do as well. We'll keep a lookout for an uninterrupted one, and just let me know if you would want slices for any of the others.

dnil avatar Feb 25 '20 15:02 dnil

@dnil, this sounds good. Once we implement the initial version of motif change annotation algorithm, would you be up for running it on some of your data to confirm that the results look accurate?

egor-dolzhenko avatar Feb 26 '20 17:02 egor-dolzhenko