vawk
vawk copied to clipboard
An awk-like VCF parser
Given a multisample VCF file with different samples, if the sample name is given as a external variable, such as `sample_name="NA00001" ` ` zcat MULTI_VCF | vawk -v ext_var=$sample_name '{print...
Is there a limit on the number of infofields to print? I was trying to print all the dbnsfp annotated info fields: getting below error ``` Traceback (most recent call...
I$* to print all INFO fields
Still works with python2 and now works well with python3
vawk '{print I$dbNSFP_GERP++_RS}' input vcf prints only 0 example line from vcf to test: 11 47353646 . C T . . dbNSFP_GERP++_RS=4.33;dbNSFP_GERP++_RS_rankscore=0.52;dbNSFP_phyloP46way_primate=-0.41;
If the VCF header is missing from the input data stream, `vawk` seems to silently fail when constructing an awk program.
``` time zcat Omni25_genotypes_2141_samples.b37.v2.vcf.gz | vawk --header '{ print $1,$2,$3,$4,$5,$6,$7,$8,$9,S$NA12878 }' | bgzip -c > NA12878.omni.vcf.gz # real 19m43.893s # user 21m27.138s # sys 0m59.355s # aside: outside python it's...
1. Bypass "if in header" statement after the top of file 2. Split INFO field by using multip grep, and moving the 'or' statement inside the loop (rather than many...