bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

annotate ID for multi-allelic sites

Open ikarus97 opened this issue 1 year ago • 1 comments

I found that ID column from multi-allelic site from a source file only transferred to the first allele in the target file.

My source annotation file:

chr21	5030278	rs1258851236	C	G,T	.	.	RS=1258851236

And my target file:

chr21	5030278	.	C	G	.	.	.
chr21	5030278	.	C	T	.	.	.

The command I used is as follows: bcftools annotate -c +ID -a [source file] [target file]

And I got:

chr21	5030278	rs1258851236	C	G	.	.	.
chr21	5030278	.	C	T	.	.	.

Shouldn't the ID (rs1258851236) be annotated to both lines in the target file?

The version I used: bcftools_annotateVersion=1.19+htslib-1.19

ikarus97 avatar Aug 20 '24 20:08 ikarus97

The program has a limitation, when a VCF is used as the source of annotations, it can match a line only once. You'd have to split the multiallelic records into biallelics (bcftools norm -m -) or create a tab-delimited file. I believe that would work

pd3 avatar Sep 09 '24 12:09 pd3