vcfanno
vcfanno copied to clipboard
Vcfanno does not use pipes to delimit multiple annotations for a single ALT allele
The by_alt
operation should use pipes (perhaps this could be parameterized) to delimit multiple annotations for a single ALT allele. However when adding BED annotations, vcfanno seems to use commas to delimit annotations
root@job-FZzyJbj03gG7Y2bZGzK4GP39:/tmp# zcat chr1.vcf.gz
##fileformat=VCFv4.2
##hailversion=0.2.9-8588a25687af
##contig=<ID=1,length=249250621,assembly=GRCh37>
#CHROM POS ID REF ALT QUAL FILTER INFO
1 10177 rs367896724 A AC . . .
root@job-FZzyJbj03gG7Y2bZGzK4GP39:/tmp# zcat ENCFF171LNJ.sorted.bed.gz
chr1 10135 10285 . 0 . 28 -1 -1 75
chr1 10175 10325 . 0 . 20.0 -1 -1 75
root@job-FZzyJbj03gG7Y2bZGzK4GP39:/tmp# cat by-alt.conf.toml
[[annotation]]
names = [ "ENCFF171LNJ",]
file = "/tmp/ENCFF171LNJ.sorted.bed.gz"
columns = [ 7,]
ops = [ "by_alt",]
root@job-FZzyJbj03gG7Y2bZGzK4GP39:/tmp# vcfanno by-alt.conf.toml chr1.vcf.gz
=============================================
vcfanno version 0.3.1 [built with go1.11]
see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 1 sources from 1 files
vcfanno.go:143: using 2 worker threads to decompress query file
##fileformat=VCFv4.2
##contig=<ID=1,length=249250621,assembly=GRCh37>
##INFO=<ID=ENCFF171LNJ,Number=A,Type=String,Description="calculated by by_alt of overlapping values in column 7 from /tmp/ENCFF171LNJ.sorted.bed.gz">
##hailversion=0.2.9-8588a25687af
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
1 10177 rs367896724 A AC . . ENCFF171LNJ=28,20.0
vcfanno.go:241: annotated 1 variants in 0.00 seconds (3213.9 / second)
Expected INFO
to equal ENCFF171LNJ=28|20.0
If you have encountered an error, please include:
- [ ] minimal conf and lua files that you are using.
- [ ] urls or actual files for annotations in conf file.
- [ ] minimal query file.
- [ ] the command you used to invoke vcfanno
- [ ] the full error message
this is an oversight and therefore a deficiency in vcfanno, but it doesn't make sense to use by_alt
on a bed file (where you don't have ref
and alt
columns to indicate the exact allele).
That makes sense. If the INFO tag is for the whole locus though, then would it be possible to make the metadata line for INFO/ENCFF171LNJ say Number=.
(cf. https://samtools.github.io/hts-specs/VCFv4.2.pdf)? It could also be useful to add a line to the documentation and/or print a warning to stdout about BED annotations (just a thought)
Alternatively, what do you think about duplicating the annotations across ALT alleles when users pass in by_alt
+ BEDs? Not ideal, but users would have control
i think it should probably be an error to use by_alt with a file that doesn't have ref, alt. why don't you use op of concat
?
Good suggestion, will do