bcftools
bcftools copied to clipboard
Different outputs from **bcftools norm** command using different version of bcftools
Dear developers,
I have a problem regarding the different outputs from bcftools norm command using different version of bcftools.
If I use the bcftools version 1.7-2, the ALT column of some rows have two nucleotides separated by comma:
##bcftools_normVersion=1.7+htslib-1.7-2
##bcftools_normCommand=norm -f /mnt/projects/schmidtf/analysis/Radhika/renxi/bin/library/GRCh38.fa -d all -c sx -O v -o using22sentrix_GS_results_from_RX.noIndel.2Alleles.noINDEL.recode.withChr.setref.nodup.vcf using22sentrix_GS_results_from_RX.noIndel.2Alleles.noINDEL.recode.withChr.vcf.gz; Date=Wed Jul 6 04:39:41 2022
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1_7516 2_8897 3_9715 4_11564 5_NBM27087 6_NBM90222 7_NBM250287 8_P696 9_NBM9001
10_NBM9017 11_P631 12_7751 13_8813 14_9047 15_9078 16_10550 17_10761 18_11723 19_P213 20_P625 21_9182 22_10712
chr1 817341 rs3131972 A T,C . . PR GT 2/2 2/2 2/1 2/2 2/2 2/2 2/2 2/1 2/1 2/1 2/1 2/1
2/1 2/1 2/2 2/1 2/2 2/1 2/2 2/2 1/1 2/2
chr1 823656 GSA-rs114525117 G A . . PR GT 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/1 0/0 0/1
0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/0
chr1 858952 rs12127425 G A . . PR GT 0/0 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/1 0/0 0/0
0/0 0/0 0/1 0/0 0/1 0/0 0/1 1/1 0/0 0/0
chr1 903175 rs4970383 C T,G . . PR GT 2/1 2/2 2/1 2/2 2/2 2/1 2/1 2/1 2/2 2/2 2/1 2/1
2/2 2/2 2/2 2/2 2/2 2/1 2/1 2/1 2/1 2/1
However, if I use the latest version 1.15.1 for analyzing the same input file, the ALT column has only one nucleotide:
##bcftools_normVersion=1.15.1+htslib-1.15.1
##bcftools_normCommand=norm -f /mnt/projects/schmidtf/analysis/Radhika/renxi/bin/library/GRCh38.fa -d all -c sx -O v -o using22sentrix_GS_results_from_RX.noIndel.2Alleles.noINDEL.recode.withChr.setref.nodup.vcf using22sentrix_GS_results_from_RX.noIndel.2Alleles.noINDEL.recode.withChr.vcf.gz; Date=Wed Jul 6 03:16:32 2022
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1_7516 2_8897 3_9715 4_11564 5_NBM27087 6_NBM90222 7_NBM250287 8_P696 9_NBM9001
10_NBM9017 11_P631 12_7751 13_8813 14_9047 15_9078 16_10550 17_10761 18_11723 19_P213 20_P625 21_9182 22_10712
chr1 817341 rs3131972 A T . . PR GT 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/1 0/1 0/1 0/1 0/1
0/1 0/1 0/0 0/1 0/0 0/1 0/0 0/0 1/1 0/0
chr1 823656 GSA-rs114525117 G A . . PR GT 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/1 0/0 0/1
0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/0 0/0
chr1 858952 rs12127425 G A . . PR GT 0/0 0/0 0/0 0/0 0/1 0/0 0/0 0/0 0/0 0/1 0/0 0/0
0/0 0/0 0/1 0/0 0/1 0/0 0/1 1/1 0/0 0/0
chr1 903175 rs4970383 C T . . PR GT 0/1 0/0 0/1 0/0 0/0 0/1 0/1 0/1 0/0 0/0 0/1 0/1
0/0 0/0 0/0 0/0 0/0 0/1 0/1 0/1 0/1 0/1
May I ask why there is such a difference? Because the difference will potentially affect downstream analysis results.
Thank you, Ren Xi
Err, can you please provide a small test case to reproduce the problem?