PyVCF
PyVCF copied to clipboard
Incorrect uncalled call writing
Sometime between June 21 and now, the first sample in the last row of walk_left.vcf started being written as "./." instead of "./.:35:4".
Probably something I did, so I'll look into it asap.
But it makes me curious - if the call isn't called, is the data associated with it useful? I recall grepping the test directory for ./.: followed by a digit and only finding this file - is the ./.:35:4 something that never appears in nature?
If this is the case, would it be more correct to print "./.:.:."?
Good spot, Lenna!
Looks like it has been around for a while: https://github.com/jamescasbon/PyVCF/blob/master/vcf/parser.py#L589
The correct behaviour for the writer is that it writes the same number of fields as it read. So we cannot just return './.' if there is no GT. Also, an entry need not define all the fields, so we need to track how many it did (now we are using a namedtuple, we cannot use the non existence of a key to know it wasn't reported).
Do you fancy adding some tests for this to the writer?
I'm wondering if a simple fix would be this:
if sample.data.GT is None:
sample.data.GT = "./."
It appears that each sample has a namedtuple specific to what's present in the record's FORMAT field. If I'm misunderstanding what's happening, the more foolproof solution would be to have the writer look at the FORMAT field.