vt icon indicating copy to clipboard operation
vt copied to clipboard

Subset should modify multiallelic sites

Open AlistairNWard opened this issue 9 years ago • 1 comments

When subsetting a vcf file, it would be useful to trim all alleles from a multiallelic site that are not present in the samples being subsetted on. For example, consider the following entry

1 100 . CTTT CT,C 100 PASS ... 0/2 ...

The genotype for the sample being subsetted on is 0/2, so when subsetting, this record needs to be retained, but there is no need to retain the 'CT' allele. This also requires all INFO fields with an entry for each alternate allele to be trimmed.

It is possible to use 'vt decompose | vt subset' to get rid of the alternate allele that isn't present, but this will modify the values supplied in the genotype fields, so isn't necessarily a desirable solution.

AlistairNWard avatar May 19 '15 20:05 AlistairNWard

It might be a good idea to add an option in vt subset to perform this.

atks avatar May 19 '15 21:05 atks