bcftools
bcftools copied to clipboard
Bcftools norm prints only the header
Hello,
I used bcftools norm option to split multiallelic variants and to normalized indels. However, the output is an empty vcf file and only the header is printed.
I am using bcftools 1.19 + htslib-1.19 Here is the command:
bcftools norm -m-both --fasta-ref "${ref_name}" --threads "${num_threads}" -O v "result/${INPUTVCF%.vcf}.single_sample.sorted.vcf" > "result/${INPUTVCF%.vcf}.normalized.vcf"
this was caused by wrong input vcf
Hello,
I tested again and indeed there seems to be an issue that is only for one of my 2 test cases though. This is running in a pipeline in nexflow so ! indicates nexflow variables.
bcftools view -s "!{sample_id}" -O v "${INPUTVCF}" | bcftools sort -O v | \
bcftools norm -m -both --multi-overlaps 0 -f "!{ref_name}" --threads "!{num_threads}" -o "result/${INPUTVCF%.vcf}.normalized.vcf"
The problem seems to be indeed in bcftools norm. In fact what I observed before (printing only the header) was the stdout. However, checking the stderr file there is the follwing indication:
Error: wrong number of fields in INFO/AS_MQRankSum at chr1:1793064, expected 3, found 1
which is the first variant in the original file which contains 3 samples (1 tumor and 2 normal), so I select the tumor and want to normalize. This is how the original file looks:
##contig=<ID=HLA-DRB1*15:03:01:02,length=11569,assembly=Homo_sapiens_assembly38.fasta>
##contig=<ID=HLA-DRB1*16:02:01,length=11005,assembly=Homo_sapiens_assembly38.fasta>
##filtering_status=Warning: unfiltered Mutect 2 calls. Please run FilterMutectCalls to remove false positives.
##normal_sample=126697
##reference=file:///tmp/nxf.V9ygJBnOTq/Homo_sapiens_assembly38.fasta
##source=LeftAlignAndTrimVariants
##tumor_sample=126695
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 126695 126697 126698
chr1 1793064 . CA C,CAA,CAAA . . AS_MQRankSum=0.000;AS_SB_TABLE=240,8|62,0|188,0|27,1;DP=659;ECNT=1;MBQ=37,37,38,38;MFRL=391,416,380,384;MMQ=60,60,60,60;MPOS=32,36,38;MQ0=0;MQRankSum=0.000;NALOD=-4.095e+01,-1.794e+02,-1.139e+01;NLOD=6.69,-1.902e+02,7.26;POPAF=6.00,6.00,6.00;RPA=13,12,14,15;RU=A;SOR=3.330;STR;TLOD=36.58,157.95,14.02 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2/3:105,30,86,15:0.118,0.372,0.062:236:44,17,47,10:60,12,39,5:103,2,131,0 0/0:76,19,40,8:0.132,0.282,0.060:143:36,9,19,4:38,10,21,4:71,5,67,0 0/0:67,13,62,5:0.082,0.437,0.034:147:39,6,37,4:27,7,25,1:66,1,79,1
chr1 1815530 . A T . . AS_MQRankSum=0.000;AS_SB_TABLE=57,35|9,0;DP=120;ECNT=1;MBQ=37,21;MFRL=338,345;MMQ=60,60;MPOS=5;MQ0=0;MQRankSum=0.000;NALOD=-3.343e+00;NLOD=7.04;POPAF=6.00;SOR=3.677;TLOD=3.63 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:39,4:0.105:43:22,3:17,1:25,14,4,0 0/0:37,4:0.111:41:17,3:20,1:23,14,4,0 0/0:16,1:0.044:17:11,1:5,0:9,7,1,0
chr1 1817466 . A T . . AS_MQRankSum=0.000;AS_SB_TABLE=522,0|19,0;DP=544;ECNT=1;MBQ=38,24;MFRL=450,340;MMQ=60,60;MPOS=5;MQ0=0;MQRankSum=0.000;NALOD=-7.007e+00;NLOD=69.05;POPAF=6.00;SOR=0.001;TLOD=7.71 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:210,9:0.043:219:106,5:100,4:210,0,9,0 0/0:159,3:0.023:162:79,2:75,1:159,0,3,0 0/0:153,7:0.047:160:86,4:66,3:153,0,7,0
chr1 1818301 . C A . . AS_MQRankSum=0.000;AS_SB_TABLE=1313,528|0,273;DP=2248;ECNT=2;MBQ=30,12;MFRL=280,289;MMQ=60,60;MPOS=12;MQ0=0;MQRankSum=0.000;NALOD=-1.126e+01;NLOD=209.63;POPAF=6.00;SOR=11.226;TLOD=28.37 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:725,134:0.066:859:280,13:289,10:547,178,0,134 0/0:631,50:0.019:681:223,8:228,3:397,234,0,50 0/0:485,89:0.056:574:197,0:185,5:369,116,0,89
chr1 1818305 . TA T,TAA,TAAA . . AS_MQRankSum=0.000;AS_SB_TABLE=139,262|122,283|78,161|30,61;DP=2431;ECNT=2;MBQ=20,25,20,28;MFRL=280,286,277,322;MMQ=60,60,60,60;MPOS=23,21,20;MQ0=0;MQRankSum=0.000;NALOD=-1.099e+02,-6.413e+01,-1.463e+01;NLOD=-9.531e+01,-4.782e+01,23.02;POPAF=6.00,6.00,6.00;RPA=24,23,25,26;RU=A;SOR=0.859;STR;TLOD=108.11,85.75,16.71 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2/3:183,186,133,47:0.297,0.250,0.084:549:63,68,40,18:83,88,51,11:68,115,124,242 0/0:83,113,32,8:0.420,0.134,0.025:236:29,45,11,2:35,48,17,2:27,56,41,112 0/0:135,106,74,36:0.272,0.215,0.106:351:44,34,22,15:52,42,30,12:44,91,65,151
chr1 1824886 . TTTAGATATATCTGCTCCATCTAGCAGTCTATCAGAAGCTGTAGATTCCAATCATCACCGCTAGGGAAGATC T . . AS_MQRankSum=0.099;AS_SB_TABLE=839,284|8,1;DP=1281;ECNT=1;MBQ=36,24;MFRL=364,268;MMQ=60,60;MPOS=6;MQ0=0;MQRankSum=0.099;NALOD=-5.626e+00;NLOD=146.26;POPAF=6.00;SOR=1.203;TLOD=4.26 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:483,3:9.849e-03:486:172,0:143,3:355,128,3,0 0/0:337,2:9.777e-03:339:112,0:104,1:262,75,2,0 0/0:303,4:0.016:307:98,0:89,2:222,81,3,1
chr1 2558208 . C T . . AS_MQRankSum=0.000;AS_SB_TABLE=408,4|16,0;DP=568;ECNT=2;MBQ=34,25;MFRL=366,333;MMQ=60,60;MPOS=5;MQ0=0;MQRankSum=0.000;NALOD=-3.856e+00;NLOD=52.18;POPAF=6.00;SOR=0.042;TLOD=8.44 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:186,9:0.048:195:109,5:76,4:183,3,9,0 0/0:124,3:0.028:127:66,2:58,1:124,0,3,0 0/0:102,4:0.041:106:45,2:56,2:101,1,4,0
chr1 2558272 . G T . . AS_MQRankSum=0.000;AS_SB_TABLE=119,72|9,0;DP=253;ECNT=2;MBQ=38,23;MFRL=312,285;MMQ=60,60;MPOS=3;MQ0=0;MQRankSum=0.000;NALOD=-7.642e-01;NLOD=24.27;POPAF=6.00;SOR=3.638;TLOD=5.56 GT:AD:AF:DP:F1R2:F2R1:SB 0/1:89,6:0.067:95:48,2:39,4:50,39,6,0 0/0:56,2:0.042:58:33,1:22,1:39,17,2,0 0/0:46,1:0.034:47:19,0:26,1:30,16,1,0
chr1 2559459 . C A,T . . AS_MQRankSum=-0.002;AS_SB_TABLE=2474,2223|2361,534|9,538;DP=8476;ECNT=1;MBQ=20,38,9;MFRL=258,259,256;MMQ=60,60,60;MPOS=38,19;MQ0=0;MQRankSum=-0.002;NALOD=-4.677e+03,-3.075e+01;NLOD=-4.716e+03,297.50;POPAF=6.00,6.00;SOR=1.598;TLOD=3613.74,17.54 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2:2051,1250,220:0.403,0.014:3521:934,593,9:723,464,8:1100,951,1028,442 0/0:93,1645,324:0.932,0.067:2062:0,743,18:0,637,9:0,93,1341,628 0/0:2553,0,3:4.555e-04,5.825e-04:2556:1208,0,0:885,0,1:1374,1179,1,2
After the command above you get the following output:
##contig=<ID=HLA-DRB1*15:03:01:01,length=11567,assembly=Homo_sapiens_assembly38.fasta>
##contig=<ID=HLA-DRB1*15:03:01:02,length=11569,assembly=Homo_sapiens_assembly38.fasta>
##contig=<ID=HLA-DRB1*16:02:01,length=11005,assembly=Homo_sapiens_assembly38.fasta>
##filtering_status=Warning: unfiltered Mutect 2 calls. Please run FilterMutectCalls to remove false positives.
##normal_sample=126697
##reference=file:///tmp/nxf.V9ygJBnOTq/Homo_sapiens_assembly38.fasta
##source=LeftAlignAndTrimVariants
##tumor_sample=126695
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##bcftools_viewVersion=1.13+htslib-1.13+ds
##bcftools_viewCommand=view -s 126695 126695.Mutect2.merged.vcf; Date=Thu Mar 28 13:43:57 2024
##bcftools_normVersion=1.13+htslib-1.13+ds
##bcftools_normCommand=norm -m- sorted_single_sample.vcf; Date=Thu Mar 28 14:28:41 2024
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 126695
Error: wrong number of fields in INFO/AS_MQRankSum at chr1:1793064, expected 3, found 1
Any idea what the problem is?
You are not showing how INFO/AS_MQRankSum is defined in the header, but the error message is clear: the program expects three values but found only one. Since at that position there are three alternate alleles, most likely the tag is defined as Number=A, meaning that there should be as many values as there are alternate alleles.
To get past this error, the program that produced the VCF should be fixed or the problem reported to the developer.
A quick workaround is to either remove the tag completely (bcftools annotate -x AS_MQRankSum
) or redefine it as having variable number of values using bcftools reheader
bcftools view --header-only file.vcf > hdr.txt
# edit hdr.txt and replace Number=A with Number=.
bcftools reheader -h hdr.txt file.vcf > fixed.vcf