strelka icon indicating copy to clipboard operation
strelka copied to clipboard

Scripts to extract variant allele frequency (VAF)

Open maximus3219 opened this issue 1 year ago • 1 comments

Since variant allele frequency (VAF), allele depth (AD), depth (DP) is the fundamental information to interpret NGS data, but unfortunately it is not readily available in the outputs from Strelka. If there is no plan to incorporate such findings in the outputs, can you provide the bash script as to extract such information and output in a separate column, or directly filter the variants based on the values of VAF, AD and DP? bcftools can filter such information directly if such information is available directly from INFO or FORMAT field e.g. bcftools filter -i FORMAT/AF[1] >0.05 input.vcf.gz

But unfortunately extracting information is extremely complicated as stated in the manual: refCounts = Value of FORMAT column $REF + “U” (e.g. if REF="A" then use the value in FOMRAT/AU) altCounts = Value of FORMAT column $ALT + “U” (e.g. if ALT="T" then use the value in FOMRAT/TU) tier1RefCounts = First comma-delimited value from $refCounts tier1AltCounts = First comma-delimited value from $altCounts Somatic allele freqeuncy is $tier1AltCounts / ($tier1AltCounts + $tier1RefCounts)

How exactly can I implement the above pseudocode in the bash script with bcftools or other tools?

I have searched hundreds of webpage, and there is no one giving solutions or even discussing it!!

maximus3219 avatar Oct 18 '23 15:10 maximus3219

@maximus3219 I don't know if you are still interested but I had the same problem last week. I wrote a Python script to calculate VAF for indels and snvs from the somatic VCF files. I couldn't get it done with bcftools either, but here is the script. It calculates the VAF for each variant and includes this information for the normal and tumour samples in the final output vcf. Usage instructions are on the README.md

juliawiggeshoff avatar Dec 18 '23 09:12 juliawiggeshoff