PyVCF
PyVCF copied to clipboard
Potential error in vcf.model._Record._set_start_and_end
in vcf.model._Record._set_start_and_end, I notice that the first line initialized the affected start and end thus: self.affected_start = self.affected_end = self.POS
and then after doing the start/end calculations in a zero-based index manner follows that up with: self.affected_start = min(self.affected_start, start) self.affected_end = max(self.affected_end, end)
wouldn't the self.affected_start & self.affected_end still be in a 1-based coordinate state (based on the value of POS) when the min & max functions run while the start/end variables have been computed using a zero-based coordinate system?
Keep in mind that for multi-base variants, the first base of REF and ALT will be the same (i.e., unchanged), so the affected region should not always include the REF base.
I think the code works as advertised. Can you think of concrete example where it doesn't?
By the way, you can find some more discussion on handling missing ALT alleles in the corresponding pull request #161.
I don't know the format specifications as well as you do, I assume the specification for multi-base variants states that the first base should be the same?
Jumping between 0 and 1 gets tricky. I'll trust that you know what you're doing, it just seemed like it was comparing across different indexing systems. (if you intended for the effect to effectively be +1, then its all good)
If i'm looking to compare the VCF to entries in a MAF file, I assume I ought to be using the affected_start and affected end?
Should I be cutting the first bases off for the MNV records then? Are there equivalent affected_ref and affected version of the ALT?
I'm not sure what a MAF file is, but the affected_start
and affected_end
fields give you exactly the complete region (zero-based, open-ended) on the reference that is affected (by all alternative alleles combined). There currently is no shortcut to get the affected region by only one of the alternative alleles.
Its a format used by the TCGA project. It has some similarities to all the other annotation data types but also has some sample information and columns for base calls in the reference AND matched normals.
Luckily the VCF file from COSMIC doesn't appear to contain any multi-ALT entries, so that simplified my life a bit.
TY for your response and assistance thus far.