vt
vt copied to clipboard
Subset should modify multiallelic sites
When subsetting a vcf file, it would be useful to trim all alleles from a multiallelic site that are not present in the samples being subsetted on. For example, consider the following entry
1 100 . CTTT CT,C 100 PASS ... 0/2 ...
The genotype for the sample being subsetted on is 0/2, so when subsetting, this record needs to be retained, but there is no need to retain the 'CT' allele. This also requires all INFO fields with an entry for each alternate allele to be trimmed.
It is possible to use 'vt decompose | vt subset' to get rid of the alternate allele that isn't present, but this will modify the values supplied in the genotype fields, so isn't necessarily a desirable solution.
It might be a good idea to add an option in vt subset to perform this.