vcf2maf icon indicating copy to clipboard operation
vcf2maf copied to clipboard

--max-subpop-af: how can this option be deactivated and can it be applied to other populations?

Open ISmolicz opened this issue 3 years ago • 3 comments

Dear vcf2maf Team,

From the documentation, I can see that --max-subpop-af applies to gnomAD subpopulations and adds a FILTER tag if the AF is greater than 0.04%.

Please could you advise on the following:

  1. How can this option be deactivated?
  2. Why were gnomAD subpopulations selected rather than e.g. 1000 Genomes Phase 3 subpopulations?
  3. Would it be possible to add an option for the user to specify which population(s) --max-subpop-af should be applied to when running the script?

Thank you for your help. I look forward to your reply.

ISmolicz avatar Jun 14 '21 17:06 ISmolicz

Hi @ISmolicz. The --max-subpop-af option cannot be deactivated, but it is trivial to remove the common_variant tag in a downstream step. It is also trivial to filter based on specific subpopulations in a downstream step. If you let me know what programming languages you are familiar with, I can provide a script. gnomAD is preferred for minor allele freqs because the larger cohorts provide rarer variants. 1000genomes phase 3 is only 2500 individuals, so an AF of 0.04% would mean only 2 alleles.

ckandoth avatar Jun 16 '21 15:06 ckandoth

Hi @ckandoth, Thank you for your reply and for the information. It helps to be reminded that the filtering does not remove variants but adds a FILTER tag.

I am mainly familiar with R or Bash language. I have used maftools to filter since posting the issue but if you have a preferred method, please do provide a script as it would be useful to compare. I can provide the method I have used if it would be helpful also.

Can I just confirm:

  1. gnomAD exome populations are referred to with vcf2maf, as if running VEP independently?
  2. When you say that 'larger cohorts provide rarer variants', does this predominantly imply that by using gnomAD, more variants would remain following filtering compared to if using smaller populations?

Thank you again.

ISmolicz avatar Jun 16 '21 16:06 ISmolicz

maftools is a great choice for filtering. Stick with that.

  1. Good point - VEP only caches the gnomAD exome allele frequencies, not the gnomAD genomes, probably because it was too large to fit in the VEP cache. If you want to filter non-exonic variants, then use the 1000genomes AFs with maftools filters.

  2. The "common_variant" filter is for somatic variants - it is common for germline heterozygous variants to be incorrectly called somatic. This is where it helps to have comprehensive germline variant databases like gnomAD to help tag potential false-positive somatic variants.

ckandoth avatar Jun 19 '21 04:06 ckandoth