vcfanno
vcfanno copied to clipboard
Multiple Issues in vcfanno 0.3.5: [[postannotation]] Parsing, GENE not found in INFO, and non-bgzip Fallback
Hello,
I’ve encountered three distinct issues with vcfanno version 0.3.5 (built with Go 1.22.2) while annotating a VCF file against COSMIC v101 GRCh37 data:
[[postannotation]]Parsing Failure: The[[postannotation]]block withops=["lua:..."]fails with must specify an 'op' for postannotation, despite the ops field being present.GENE not found in INFO Warning: A persistent warning appears when extracting GENE from annotation files, even though it exists in the source VCFs.- Falling Back to
non-bgzip: vcfanno falls back to non-bgzip mode despite all input and annotation files being bgzip-compressed and tabix-indexed, impacting performance.
vcfanno Version: 0.3.5 [built with go1.22.2]
OS: Ubuntu 24.04.1 LTS
Kernel: Linux 6.8.0-52-generic
Architecture: x86-64
INPUT: input.vcf.gz: A bgzipped VCF file (readable, tabix-indexed).
Annotation files: COSMIC v101 GRCh37 VCFs (bgzipped, tabix-indexed):
- Cosmic_GenomeScreensMutant_v101_GRCh37_sorted.vcf.gz
- Cosmic_CompleteTargetedScreensMutant_v101_GRCh37_sorted.vcf.gz
- Cosmic_NonCodingVariants_v101_GRCh37_sorted.vcf.gz
Issue 1: [[postannotation]] Parsing Failure: My config file looks like this:
[[annotation]]
file="/mnt/Molpath/ref_data/homo_sapiens/COSMIC/v101/GRCh37/Cosmic_GenomeScreensMutant_v101_GRCh37_sorted.vcf.gz"
fields=["ID","GENOME_SCREEN_SAMPLE_COUNT","GENE"]
names=["CGS_ID","CGS_count","CGS_gene"]
ops=["first","first","first"]
[[annotation]]
file="/mnt/Molpath/ref_data/homo_sapiens/COSMIC/v101/GRCh37/Cosmic_CompleteTargetedScreensMutant_v101_GRCh37_sorted.vcf.gz"
fields=["ID","GENE","TARGETED_SCREEN_SAMPLE_COUNT"]
names=["CTS_ID","CTS_gene","CTS_count"]
ops=["first","first","first"]
[[annotation]]
file="/mnt/Molpath/ref_data/homo_sapiens/COSMIC/v101/GRCh37/Cosmic_NonCodingVariants_v101_GRCh37_sorted.vcf.gz"
fields=["ID","SAMPLE_COUNT","GENE"]
names=["CNC_ID","CNC_count","CNC_gene"]
ops=["first","first","first"]
[[postannotation]]
fields=["CTS_ID","CGS_ID","CNC_ID"]
names=["ID"]
ops=["lua:select_id"]
My lua script (select_id.lua):
function select_id(cts_id, cgs_id, cnc_id)
if cts_id and cts_id ~= "NA" then
return cts_id
elseif cgs_id and cgs_id ~= "NA" then
return cgs_id
elseif cnc_id and cnc_id ~= "NA" then
return cnc_id
else
return "NA"
end
end
Run: /opt/vcfanno/vcfanno -lua select_id.lua -p 1 test.conf input.vcf.gz > test.vcf
Error message:
=============================================
vcfanno version 0.3.5 [built with go1.22.2]
see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:104: error in postannotation section err: must specify an 'op' for postannotation
No output VCF is generated.
Additional Observations:
-
Lua scripts are syntactically valid (tested with lua -e "dofile('select_id.lua')"). -
The error persists with absolute paths and minimal configs. -
Previous runs without [[postannotation]] (e.g., just `[[annotation]]` blocks) succeed, annotating ~100k variants in ~60s.
For now I used the following workaround: use sequential [[annotation]] blocks with shared ID field (TargetedScreens last for highest priority).
Issue 2: GENE not found in INFO Warning
For this I used the same config file as above (without the [[postannotation]] part) and the command line:
/opt/vcfanno/vcfanno -p 1 test.conf input.vcf.gz > test.vcf
The tool returns a warning, although the annotation succeeds: vcfanno.go:195: Info Error: GENE not found in INFO >> this error/warning may occur many times. reporting once here.... I already checked and "GENE" is present in the COSMIC VCFs. Most likely the warning is triggered by variants in one COSMIC file lacking GENE.
Issue 3: Falling Back to non-bgzip I'm using the same configuration as above, as well as the same command line. All files are bgzip-compressed (file reports Blocked GNU Zip Format (BGZF; gzip compatible)), with valid .tbi indexes. However, I get all the time the message:
vcfanno.go:157: falling back to non-bgzip
vcfanno.go:250: annotated 100807 variants in 62.75 seconds (1606.6 / second)
I would it really appreciate it if you could tell me how to fix Issue 1, so that I can use lua scripts. Furthermore, I would like to know why GENE is not found and how to suppress it when GENE is present in annotation files, as well as how to fix the bgzip detection failure. Thanks for your help.
Best regards, Mihaela
Hi, for issue1 you would instead use:
[[postannotation]]
fields=["CTS_ID","CGS_ID","CNC_ID"]
name="ID"
op="lua:select_id"
issue2
This just means that field is not found in some rows of the VCF. It's probably safe to ignore
issue3
This would mean that your query file is not bgzipped and indexed. Can you verify that's the case?
Hi Brent,
thanks for your comments. I will try issue 1, and ignore issue 2. Regarding issue3, I already tested the file and they are bgzipped and indexed.
Best,
Mihaela
Von: Brent Pedersen @.***> Gesendet: Freitag, 14. März 2025 15:45:11 An: brentp/vcfanno Cc: Mihaela Martis-Thiele; Author Betreff: Re: [brentp/vcfanno] Multiple Issues in vcfanno 0.3.5: [[postannotation]] Parsing, GENE not found in INFO, and non-bgzip Fallback (Issue #162)
Hi, for issue1 you would instead use:
[[postannotation]] fields=["CTS_ID","CGS_ID","CNC_ID"] name="ID" op="lua:select_id"
issue2
This just means that field is not found in some rows of the VCF. It's probably safe to ignore
issue3
This would mean that your query file is not bgzipped and indexed. Can you verify that's the case?
— Reply to this email directly, view it on GitHubhttps://github.com/brentp/vcfanno/issues/162#issuecomment-2724922675, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWZR5QWO4I4NM4BO74JXMRT2ULTPPAVCNFSM6AAAAABZAMQPRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMRUHEZDENRXGU. You are receiving this because you authored the thread.Message ID: @.***>
[brentp]brentp left a comment (brentp/vcfanno#162)https://github.com/brentp/vcfanno/issues/162#issuecomment-2724922675
Hi, for issue1 you would instead use:
[[postannotation]] fields=["CTS_ID","CGS_ID","CNC_ID"] name="ID" op="lua:select_id"
issue2
This just means that field is not found in some rows of the VCF. It's probably safe to ignore
issue3
This would mean that your query file is not bgzipped and indexed. Can you verify that's the case?
— Reply to this email directly, view it on GitHubhttps://github.com/brentp/vcfanno/issues/162#issuecomment-2724922675, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWZR5QWO4I4NM4BO74JXMRT2ULTPPAVCNFSM6AAAAABZAMQPRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMRUHEZDENRXGU. You are receiving this because you authored the thread.Message ID: @.***>
I see, there's a weird logic error in the code. You must be on a machine with few CPUs, right? You can do :
ln -s $your.vcf.gz $your.vcf.bgz
and then run vcfanno ... $your.vcf.bgz
and it should use bgzip.
Yes. Thanks for your suggestion. I will try it out.
Von: Brent Pedersen @.***> Gesendet: Freitag, 14. März 2025 19:00:57 An: brentp/vcfanno Cc: Mihaela Martis-Thiele; Author Betreff: Re: [brentp/vcfanno] Multiple Issues in vcfanno 0.3.5: [[postannotation]] Parsing, GENE not found in INFO, and non-bgzip Fallback (Issue #162)
I see, there's a weird logic error in the code. You must be on a machine with few CPUs, right? You can do :
ln -s $your.vcf.gz $your.vcf.bgz
and then run vcfanno ... $your.vcf.bgz and it should use bgzip.
— Reply to this email directly, view it on GitHubhttps://github.com/brentp/vcfanno/issues/162#issuecomment-2725400488, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWZR5QXEHPSNHLRHNOLD5ET2UMKNTAVCNFSM6AAAAABZAMQPRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMRVGQYDANBYHA. You are receiving this because you authored the thread.Message ID: @.***>
[brentp]brentp left a comment (brentp/vcfanno#162)https://github.com/brentp/vcfanno/issues/162#issuecomment-2725400488
I see, there's a weird logic error in the code. You must be on a machine with few CPUs, right? You can do :
ln -s $your.vcf.gz $your.vcf.bgz
and then run vcfanno ... $your.vcf.bgz and it should use bgzip.
— Reply to this email directly, view it on GitHubhttps://github.com/brentp/vcfanno/issues/162#issuecomment-2725400488, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AWZR5QXEHPSNHLRHNOLD5ET2UMKNTAVCNFSM6AAAAABZAMQPRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMRVGQYDANBYHA. You are receiving this because you authored the thread.Message ID: @.***>
Hi,
your suggestion for issue 3 has worked out. Adding the "bgz" ending is doing the trick. Issue 1 had still a few errors in it, but I got it to run. Thanks for the fast support :)
Best, Mihaela