PyVCF icon indicating copy to clipboard operation
PyVCF copied to clipboard

Incorrect uncalled call writing

Open lennax opened this issue 12 years ago • 2 comments

Sometime between June 21 and now, the first sample in the last row of walk_left.vcf started being written as "./." instead of "./.:35:4".

Probably something I did, so I'll look into it asap.

But it makes me curious - if the call isn't called, is the data associated with it useful? I recall grepping the test directory for ./.: followed by a digit and only finding this file - is the ./.:35:4 something that never appears in nature?

If this is the case, would it be more correct to print "./.:.:."?

lennax avatar Jul 03 '12 02:07 lennax

Good spot, Lenna!

Looks like it has been around for a while: https://github.com/jamescasbon/PyVCF/blob/master/vcf/parser.py#L589

The correct behaviour for the writer is that it writes the same number of fields as it read. So we cannot just return './.' if there is no GT. Also, an entry need not define all the fields, so we need to track how many it did (now we are using a namedtuple, we cannot use the non existence of a key to know it wasn't reported).

Do you fancy adding some tests for this to the writer?

jamescasbon avatar Jul 04 '12 08:07 jamescasbon

I'm wondering if a simple fix would be this:

if sample.data.GT is None:
    sample.data.GT = "./."

It appears that each sample has a namedtuple specific to what's present in the record's FORMAT field. If I'm misunderstanding what's happening, the more foolproof solution would be to have the writer look at the FORMAT field.

lennax avatar Jul 07 '12 17:07 lennax