hts-specs icon indicating copy to clipboard operation
hts-specs copied to clipboard

Revision of SB INFO Number and type

Open eyherabh opened this issue 3 years ago • 0 comments

#290 fixed #189 which requested the definition of the SB INFO field based on the observation that GATK defines SB as Number=4,Type=Integer (https://github.com/samtools/hts-specs/issues/189#issue-209808608). However, that definition is for the SB per sample, that is for SB FORMAT field. All VCFs in the gatk repository are consistent with that:

find . -type f -name "*.vcf" -exec grep -h "^##FORMAT=<ID=SB," {} \; | sort | uniq -c
    225 ##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">

In the case of SB in the INFO field, it uses Number=1 and Type=Float as shown below

find . -type f -name "*.vcf" -exec grep -h "^##INFO=<ID=SB," {} \; | sort | uniq -c
     41 ##INFO=<ID=SB,Number=1,Type=Float,Description="Strand bias">
     21 ##INFO=<ID=SB,Number=1,Type=Float,Description="Strand Bias">

I think the decision in #290 and #189 should be revised.

eyherabh avatar Sep 24 '21 07:09 eyherabh