vcf2maf
vcf2maf copied to clipboard
--max-subpop-af: how can this option be deactivated and can it be applied to other populations?
Dear vcf2maf Team,
From the documentation, I can see that --max-subpop-af applies to gnomAD subpopulations and adds a FILTER tag if the AF is greater than 0.04%.
Please could you advise on the following:
- How can this option be deactivated?
- Why were gnomAD subpopulations selected rather than e.g. 1000 Genomes Phase 3 subpopulations?
- Would it be possible to add an option for the user to specify which population(s) --max-subpop-af should be applied to when running the script?
Thank you for your help. I look forward to your reply.
Hi @ISmolicz. The --max-subpop-af
option cannot be deactivated, but it is trivial to remove the common_variant
tag in a downstream step. It is also trivial to filter based on specific subpopulations in a downstream step. If you let me know what programming languages you are familiar with, I can provide a script. gnomAD is preferred for minor allele freqs because the larger cohorts provide rarer variants. 1000genomes phase 3 is only 2500 individuals, so an AF of 0.04% would mean only 2 alleles.
Hi @ckandoth, Thank you for your reply and for the information. It helps to be reminded that the filtering does not remove variants but adds a FILTER tag.
I am mainly familiar with R or Bash language. I have used maftools to filter since posting the issue but if you have a preferred method, please do provide a script as it would be useful to compare. I can provide the method I have used if it would be helpful also.
Can I just confirm:
- gnomAD exome populations are referred to with vcf2maf, as if running VEP independently?
- When you say that 'larger cohorts provide rarer variants', does this predominantly imply that by using gnomAD, more variants would remain following filtering compared to if using smaller populations?
Thank you again.
maftools is a great choice for filtering. Stick with that.
-
Good point - VEP only caches the gnomAD exome allele frequencies, not the gnomAD genomes, probably because it was too large to fit in the VEP cache. If you want to filter non-exonic variants, then use the 1000genomes AFs with maftools filters.
-
The "common_variant" filter is for somatic variants - it is common for germline heterozygous variants to be incorrectly called somatic. This is where it helps to have comprehensive germline variant databases like gnomAD to help tag potential false-positive somatic variants.