pandora icon indicating copy to clipboard operation
pandora copied to clipboard

VCF header needs more information

Open rmcolq opened this issue 6 years ago • 2 comments

All VCFs need the following in their header:

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MEAN_FWD_COVG,Number=A,Type=Integer,Description="Mean forward coverage">
##FORMAT=<ID=MEAN_REV_COVG,Number=A,Type=Integer,Description="Mean reverse coverage">
##FORMAT=<ID=MED_FWD_COVG,Number=A,Type=Integer,Description="Med forward coverage">
##FORMAT=<ID=MED_REV_COVG,Number=A,Type=Integer,Description="Med reverse coverage">
##FORMAT=<ID=SUM_FWD_COVG,Number=A,Type=Integer,Description="Sum forward coverage">
##FORMAT=<ID=SUM_REV_COVG,Number=A,Type=Integer,Description="Sum reverse coverage">
##FORMAT=<ID=GAPS,Number=A,Type=Float,Description="Number of gap bases">
##FORMAT=<ID=LIKELIHOOD,Number=A,Type=Float,Description="Likelihood">
##FORMAT=<ID=GT_CONF,Number=1,Type=Float,Description="Genotype confidence">

as well as ##contig=<ID="$id"> for each $id in the CHROM field.

rmcolq avatar Aug 06 '19 10:08 rmcolq

In addition. in section 1.4 of the VCF v4.3 specs it says

File meta-information is included after the ## string and must be key=value pairs. Meta-information lines are optional, but if they are present then they must be completely well-formed. Note that BCF, the binary counterpart of VCF, requires that all entries are present. It is recommended to include meta-information lines describing the entries used in the body of the VCF file.

mbhall88 avatar Aug 06 '19 10:08 mbhall88

We do not output the contig field to the header. This is not required for VCF, but is required for BCF. This is causing problems for me in further analyses

mbhall88 avatar Jan 26 '21 06:01 mbhall88