dbSNP 156 VCF now includes non-32 bit integers causing "Extreme INFO/RS value encountered and set to missing" errors
With release 156, now dbSNP includes rsIDs larger than 2^31 which cannot be properly handled by bcftools anymore:
$ wget https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.40.gz{,.tbi}
$ tabix GCF_000001405.40.gz NC_000001.11:6259533-6259533
NC_000001.11 6259533 rs2148352434 C T . . RS=2148352434;dbSNPBuildID=156;SSR=0;GENEINFO=GPR153:387509;VC=SNV;INT;R5;GNO;FREQ=1000Genomes:0.9998,0.0001562
$ bcftools view -H GCF_000001405.40.gz -r NC_000001.11:6259533-6259533
[W::vcf_parse_info] Extreme INFO/RS value encountered and set to missing at NC_000001.11:6259533
NC_000001.11 6259533 rs2148352434 C T . . RS=.;dbSNPBuildID=156;SSR=0;GENEINFO=GPR153:387509;VC=SNV;INT;R5;GNO;FREQ=1000Genomes:0.9998,0.0001562
If HTSlib is compiled with option -DVCF_ALLOW_INT64 then it works fine:
$ bcftools view -H GCF_000001405.40.gz -r NC_000001.11:6259533-6259533
NC_000001.11 6259533 rs2148352434 C T . . RS=2148352434;dbSNPBuildID=156;SSR=0;GENEINFO=GPR153:387509;VC=SNV;INT;R5;GNO;FREQ=1000Genomes:0.9998,0.0001562
However, this cannot be represented anymore as a binary VCF, which is a huge problem:
$ bcftools view -Ou GCF_000001405.40.gz -r NC_000001.11:6259533-6259533 | bcftools view -H
[E::bcf_write] Data at NC_000001.11:6259533 contains 64-bit values not representable in BCF. Please use VCF instead
[main_vcfview] Error: cannot write to (null)
Is there a discussion in samtools/hts-specs to get the BCF specification to update the specification to 64-bit values?
Changing BCF specification is not an easy task and may take a long time even if there is a good will to do it. The problem could be addressed more easily at dbSNP side if the INFO/RS was a string rather than an integer.
Hi,
I am getting the same error when trying to annotate dbSNP 156. I understand from the discussion that this issue can't be fixed temporarily. But can you help me with compiling HTSlib with option -DVCF_ALLOW_INT64. I did read the documentation and it states that this option needs to be added manually in the makefile. I tried that and it's not working. I made this change in the makefile in the htslib-1.20 folder with bcftools-1.20. Since I have no experience in developing with C++ and make, could you please specify the exact changes to be made in the makefile? Is this correct? CFLAGS = -g -Wall -O2 -fvisibility=hidden -DVCF_ALLOW_INT64=1
Yes, that is correct, one must compile with -DVCF_ALLOW_INT64. Try to force recompilation of vcf.c with touch vcf.c, see what the standard make command line looks like and add -DVCF_ALLOW_INT64. It should be noted that this has not been terribly well tested, hopefully the code did not deteriorate too much.
Perhaps a simpler workaround is to edit the VCF using the reheader command, changing the offending tag to Type=String
bcftools view -h file.vcf.gz > hdr.txt
# edit hdr.txt and change the offending tag to Type=String
reheader -h hdr.txt -o new.bcf file.vcf.gz
Hi,
Thanks for the solutions. I tried to recompile with touch vcf.c and the addition of -DVCF_ALLOW_INT64 in the makefile but the error persisted.
The second solution, which is changing the tag to Type=String, worked and I could successfully use bcftools view as well as bcftools annotate
Thank you very much for your help.