pyfaidx
pyfaidx copied to clipboard
Support for indels in FastaVariant class
Hello - I was wondering if it would be possible to recognize indels within the fastavariant class? What are the challenges involved there?
Hmmm... I haven't paid much attention to the FastaVariant class recently. I think indels shouldn't be too hard, but I omitted them originally as I wanted to maintain a 1-1 mapping with the original reference coordinates. Do you have a use case for this?
Sure - well, I can describe my reason for wanting this implemented.
I am developing a number of utilities for working with VCF files. One of the tools is aimed at helping to validate variants within VCFs. It generates primers (using primer3) for sanger sequencing or snip-SNP verification based on any variants that are provided as input. However, when generating primers or looking for restriction sites, I want to account for neighboring variation to increase the changes that primers work (by incorporating alternative alleles/indels) or predicted product sizes (resulting from differences in restriction sites) are accurate.
In terms of coordinates - I don't think it is an issue? I always intend to work off of reference coordinates and account for differences afterwords. In other words, if I slice from I:1-100, and this region contains an insertion at 50, it should return the reference from 1-100, and THEN add the insertion. The resulting string will be longer than 100 bp. For a deletion, the string would be shorter than 100. Are there any reasons why this might be an issue?
Thanks for your continual support of pyfaidx - it has been very useful.
Would be nice to have indels implemented same way as "bcftools consensus" works. Now I run bcftools as subprocess to incorporate all VCF records in fasta regions
Thank you in advance
Thanks for the feedback. I agree that the bcftools model is appropriate, and if I can get some time, or someone willing to help with the implementation, it will get done :).