delly icon indicating copy to clipboard operation
delly copied to clipboard

The REF prefixes differ: T vs C (1,1) Failed to merge alleles at...

Open MrLocuace opened this issue 5 years ago • 3 comments

Hello, I am trying to merge two vcf files with bcftools. These are the commands I am using: /bcftools merge -m id -o merged.vcf.gz pop1.vcf.gz pop2.vcf.gz

I get the following message:

The REF prefixes differ: T vs C (1,1) Failed to merge alleles at 10:252693 in /path/merged.vcf.gz

Any help would be very welcome ! Thanks very much in advance

These are the headers of the 2 files and some samples:

pop1.vcf.gz:

##fileformat=VCFv4.1 ##FILTER=<ID=PASS,Description="All filters passed"> ##filedate=20180903 ##source="beagle.jar (r1399)" ##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated Allele Frequencies"> ##INFO=<ID=AR2,Number=1,Type=Float,Description="Allelic R-Squared: estimated correlation between most probable ALT dose and true ALT dose"> ##INFO=<ID=DR2,Number=1,Type=Float,Description="Dosage R-Squared: estimated correlation between estimated ALT dose [P(RA) + 2*P(AA)] and true ALT dose"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##contig=<ID=10> ##bcftools_annotateVersion=1.9+htslib-1.9 ##bcftools_annotateCommand=annotate -x FORMAT/DS,FORMAT/GP -o /path/pop1_195_10_GT.vcf /path/pop1_195_10_phased.vcf.gz; Date=Tue Sep 4 15:50:41 2018 ##contig=<ID=11> ##contig=<ID=12> ##contig=<ID=13> ##contig=<ID=14> ##contig=<ID=15> ##contig=<ID=16> ##contig=<ID=17> ##contig=<ID=18> ##contig=<ID=19> ##contig=<ID=1> ##contig=<ID=20> ##contig=<ID=21> ##contig=<ID=22> ##contig=<ID=2> ##contig=<ID=3> ##contig=<ID=4> ##contig=<ID=5> ##contig=<ID=6> ##contig=<ID=7> ##contig=<ID=8> ##contig=<ID=9> ##bcftools_concatVersion=1.9+htslib-1.9 ##bcftools_concatCommand=concat path/pop1_195_10_GT.vcf.gz /path/pop1_195_11_GT.vcf.gz /path/pop1_195_12_GT.vcf.gz /path/pop1_195_13_GT.vcf.gz /path/pop1_195_14_GT.vcf.gz /path/pop1_195_15_GT.vcf.gz /path/pop1_195_16_GT.vcf.gz /path/pop1_195_17_GT.vcf.gz /path/pop1_195_18_GT.vcf.gz /path/pop1_195_19_GT.vcf.gz /path/pop1_195_1_GT.vcf.gz /path/pop1_195_20_GT.vcf.gz /path/pop1_195_21_GT.vcf.gz /path/pop1_195_22_GT.vcf.gz /path/pop1_195_2_GT.vcf.gz /path/pop1_195_3_GT.vcf.gz //path/pop1_195_4_GT.vcf.gz /path/pop1_195_5_GT.vcf.gz /path/pop1_195_6_GT.vcf.gz /path/pop1_195_7_GT.vcf.gz /path/pop1_195_8_GT.vcf.gz /path/pop1_195_9_GT.vcf.gz; Date=Tue Sep 4 16:13:54 2018 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT spa_1 spa_2 spa_3 10 144847 rs11253478 C T . PASS AR2=1;DR2=1;AF=0.082 GT 0|0 0|0 0|1 10 244561 rs2448366 G A . PASS AR2=1;DR2=1;AF=0.446 GT 1|1 0|1 0|0 10 252693 rs2379078 T C . PASS AR2=1;DR2=1;AF=0.238 GT 0|0 0|0 0|1

Pop2:

##fileformat=VCFv4.1 ##FILTER=<ID=PASS,Description="All filters passed"> ##filedate=20180830 ##source="beagle.jar (r1399)" ##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated Allele Frequencies"> ##INFO=<ID=AR2,Number=1,Type=Float,Description="Allelic R-Squared: estimated correlation between most probable ALT dose and true ALT dose"> ##INFO=<ID=DR2,Number=1,Type=Float,Description="Dosage R-Squared: estimated correlation between estimated ALT dose [P(RA) + 2*P(AA)] and true ALT dose"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##contig=<ID=10> ##bcftools_annotateVersion=1.9+htslib-1.9 ##bcftools_annotateCommand=annotate -x FORMAT/DS,FORMAT/GP -o /path/pop2_10_GT.vcf /path/pop2_10_phased.vcf.gz ' '; Date=Tue Sep 4 12:38:31 2018 ##contig=<ID=11> ##contig=<ID=12> ##contig=<ID=13> ##contig=<ID=14> ##contig=<ID=15> ##contig=<ID=16> ##contig=<ID=17> ##contig=<ID=18> ##contig=<ID=19> ##contig=<ID=1> ##contig=<ID=20> ##contig=<ID=21> ##contig=<ID=22> ##contig=<ID=2> ##contig=<ID=3> ##contig=<ID=4> ##contig=<ID=5> ##contig=<ID=6> ##contig=<ID=7> ##contig=<ID=8> ##contig=<ID=9> ##bcftools_concatVersion=1.9+htslib-1.9 ##bcftools_concatCommand=concat /path/pop2_10_GT.vcf.gz /path/pop2_11_GT.vcf.gz /path/pop2_12_GT.vcf.gz /path/pop2_13_GT.vcf.gz /path/pop2_14_GT.vcf.gz /path/pop2_15_GT.vcf.gz /path/pop2_16_GT.vcf.gz /path/pop2_17_GT.vcf.gz /path/pop2_18_GT.vcf.gz /path/pop2_19_GT.vcf.gz /path/pop2_1_GT.vcf.gz /path/pop2_20_GT.vcf.gz /path/pop2_21_GT.vcf.gz /path/pop2_22_GT.vcf.gz /path/pop2_2_GT.vcf.gz /path/pop2_3_GT.vcf.gz /path/pop2_4_GT.vcf.gz /path/pop2_5_GT.vcf.gz /path/pop2_6_GT.vcf.gz /path/pop2_7_GT.vcf.gz /path/pop2_8_GT.vcf.gz /path/pop2_9_GT.vcf.gz; Date=Tue Sep 4 16:14:14 2018 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ind_1 ind_2 ind_3 10 144847 rs11253478 C T . PASS AR2=1;DR2=1;AF=0.474 GT 0|0 0|1 0|0 10 244561 rs2448366 G A . PASS AR2=1;DR2=1;AF=0.321 GT 1|1 1|0 0|1 10 252693 rs2379078 C T . PASS AR2=1;DR2=1;AF=0.359 GT 1|1 1|0 0|1

MrLocuace avatar Sep 05 '18 18:09 MrLocuace

This doesn't look like Delly BCF files so please post this in the bcftools repository. Thanks.

tobiasrausch avatar Sep 05 '18 18:09 tobiasrausch

Incidentally, I just got the same thing yesterday, and it's definitely Delly output. In the "sample1_sv.bcf" file, I have a line

chr1    40556015        DEL00000344     T       <DEL>   979     PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.6;END=40562907;PE=18;MAPQ=60;CT=3to5;CIPOS=-73,73;CIEND=-73,73     GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-96.6547,0,-130.769:10000:PASS:439:447:457:1:24:20:0:0

and in the "sample2_cnv.bcf" file, I get

chr1    40556015        CNV00000013     N       <CNV>   128     PASS    IMPRECISE;SVTYPE=CNV;SVMETHOD=EMBL.DELLYv1.1.6;END=40562907;CIPOS=-180,180;CIEND=-180,180;MP=0.640122   GT:CN:CNL:GQ:FT:RDCN:RDSD       ./.:1:-13.8945,0,-12.7901,-52.2648,-118.424,-211.268,-1000.49,-1000.49,-1000.49,-1000.49:128:PASS:1.02069:0.127574

Same position, two different values for REF!

bcftools will get very upset about this, and I suppose rightly so - two entries in a VCF for the same location should have the same REF!

DaGaMs avatar Nov 23 '23 12:11 DaGaMs

Yes, for the CNV module delly does not yet do the reference lookup. That's indeed something that should be fixed.

tobiasrausch avatar Nov 23 '23 13:11 tobiasrausch