vcfanno
vcfanno copied to clipboard
GTF annotation fails: index out of error
Hi
I am trying to use a GTF file to annotate the VCF. I do not find a use case so I just tried with my best guess:
[[annotation]]
file="/home/mpschr/bin/bcbionextgen/data/genomes/Hsapiens/GRCh37/rnaseq/ref-transcripts.gtf.gz"
fields = ["gene_name"]
ops = ["self"]
names = ["gene_name"]
This generates a index out of error
.
mpschr/Documents/projects/rnaseq-savar/test-data/PE03_ID.pax5-exons.standalone-variants.vcf
=============================================
vcfanno version 0.2.2 [built with go1.8]
see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:114: found 16 sources from 4 files
panic: runtime error: index out of range
goroutine 108 [running]:
github.com/brentp/vcfanno/api.collect(0x7f48719e7058, 0xc420629440, 0xc42044f840, 0x4, 0x4, 0xc420091880, 0x1, 0x0, 0x0, 0x0, ...)
/home/brentp/go/src/github.com/brentp/vcfanno/api/api.go:302 +0x1418
github.com/brentp/vcfanno/api.(*Annotator).AnnotateOne(0xc42001d4c0, 0x92d4e0, 0xc420629440, 0x795401, 0x0, 0x0, 0x0, 0xc4205bacd0, 0xc42047c680)
/home/brentp/go/src/github.com/brentp/vcfanno/api/api.go:392 +0x1ed
github.com/brentp/vcfanno/api.(*Annotator).AnnotateEnds(0xc42001d4c0, 0x92d4e0, 0xc420629440, 0x0, 0x0, 0x10000, 0xc421306f20)
/home/brentp/go/src/github.com/brentp/vcfanno/api/api.go:718 +0xdda
main.main.func1(0x92d4e0, 0xc420629440)
/home/brentp/go/src/github.com/brentp/vcfanno/vcfanno.go:154 +0x71
github.com/brentp/irelate.PIRelate.func1.1(0xc4206293e0, 0xc420d2b980, 0x190, 0x190, 0xc423256300)
/home/brentp/go/src/github.com/brentp/irelate/parallel.go:202 +0x5f
created by github.com/brentp/irelate.PIRelate.func1
/home/brentp/go/src/github.com/brentp/irelate/parallel.go:207 +0x89
How would I use a gtf-file correctly?
For anython other than VCF, you'll have to use e.g. : columns=[8]
OK, so the value is not selected according to the name field? Particularly in the GTF different lines may have different values at a certain column, depending on the element which is represented on the line in question.
yeah, I've thought about that, but haven't had many people (AFAICT) using/wanting GTF so I haven't done it.
as it is now, you could get the full space-delimited field as an annotation and then grab part of it in a [postannotation]
block.
I'll think about how to improve this, if you could explain your use-case more, it might motivate the dev.
Well in my case I just tought of annotating the variant in the .vcf with the EXON_ID(s) and CCDS_ID(s) from ensembl. So only values from the exon-element lines should be taken into consideration.
I was looking through the code and trying to figure out what module is missing for supporting true GTF
compatibility. Since I am not familiar with the language, I get a bit confused, but let me ask: Would an interface in the irelate
repository be enough implementing functions like setSource
and BamToRelatable
or is there more to it? What I did not find out which irelate
is being used for the GTF right now.
Cheers
Here is where generic intervals are parsed using chrom,start,end fields gleaned from the tabix index. https://github.com/brentp/bix/blob/master/bix.go#L172
This would be a moderately involved change, but you are welcome to give it a go and ask questions. Or you could wait and I'll try to dig in by next week.
Hi
Honestly, I feel a bit lost with the go-language and I am not sure I'd get it even working. I'll see if you find a solution - I reckon the parsing of the gtf
-format is rather easy (compared with vcf
): http://mblab.wustl.edu/GTF22.html
It is particularly helpful as many tools output their results gtf - from the top of my head, e.g. StringTie. With a StringTie GTF I would be able to easily annotate a mutation with the expression (or expressed transcripts).
Cheers