Is this expected bcftools norm behaviour for joining biallelics into a multiallelic?
Example with multiallelic: cat test20.vcf ##fileformat=VCFv4.2 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##contig=<ID=chr7> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1 chr7 117559591 Indel TCTT T,TTCT . . . GT 0/0
Step1: bcftools norm -m-any -f ../hg38/hg38.fa test20.vcf Splitting works OK.
###fileformat=VCFv4.2 ##FILTER=<ID=PASS,Description="All filters passed"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##contig=<ID=chr7> ##bcftools_normVersion=1.17+htslib-1.17 ##bcftools_normCommand=norm -m-any -f ../hg38/hg38.fa test20.vcf; Date=Sat Mar 11 15:36:30 2023 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1 chr7 117559590 Indel ATCT A . . . GT 0/0 chr7 117559592 Indel CT TC . . . GT 0/0 Lines total/split/realigned/skipped: 1/1/2/0
Step2: Splitting and joining but joining does not work. Why?
bcftools norm -m-any -f ../hg38/hg38.fa test20.vcf | bcftools norm -m+any -f ../hg38/hg38.fa Lines total/split/realigned/skipped: 1/1/2/0 ##fileformat=VCFv4.2 ##FILTER=<ID=PASS,Description="All filters passed"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##contig=<ID=chr7> ##bcftools_normVersion=1.17+htslib-1.17 ##bcftools_normCommand=norm -m-any -f ../hg38/hg38.fa test20.vcf; Date=Sat Mar 11 15:37:29 2023 ##bcftools_normCommand=norm -m+any -f ../hg38/hg38.fa; Date=Sat Mar 11 15:37:29 2023 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE1 chr7 117559590 Indel ATCT A . . . GT 0/0 chr7 117559592 Indel CT TC . . . GT 0/0 Lines total/split/realigned/skipped: 2/0/0/0
The above command correctly splits my multiallelic record into 2 biallelic records and then it should join them into a single multiallelic record but it does not. Am I doing something wrong or is this an issue with bcftools norm? Thanks.
This is the expected behavior for now, the program is not smart enough to join the two records split across different positions. The inverse of bcftools norm --atomize is needed for this.
Just to cross-reference related issues, see also https://github.com/samtools/bcftools/issues/327, https://github.com/samtools/bcftools/issues/1724, https://github.com/samtools/bcftools/issues/1843