gemini icon indicating copy to clipboard operation
gemini copied to clipboard

Genotype filter using wildcard with gt_alt_freqs > 0.3

Open OskarSchnappauf opened this issue 5 years ago • 5 comments

Dear gemini team,

I use gemini very frequently and it is an awesome tool for variant prioritization within large databases. However, one thing I could not find out yet is how to use the gt.alt.freqs option in combination with a wildcard. For instance, I want all variants with impact severity MED or HIGH and with at least two affected individuals in our database: gemini query --header -q "SELECT gene, chrom, start, end FROM variants where impact_severity != 'LOW'" --gt-filter "(gt_types).(Phenotype==2).(==HET).(count >1) and (gt_types).(Phenotype==1).(==HOM_REF).(all)" gemini.db However, some of the identified variants have a very low gt.alt.freqs. How can I include a threshold for gt.alt.freqs for the identified variants? I tried : (gt_alt_freqs).(*).(>=0.3).(any), but it did not work.

Thank you very much for your help. Oskar

OskarSchnappauf avatar Mar 19 '19 20:03 OskarSchnappauf

Anyone?

OskarSchnappauf avatar Mar 29 '19 16:03 OskarSchnappauf

When you say it did not work, do you mean you know for certain there are such variants and none were returned?

arq5x avatar Apr 08 '19 14:04 arq5x

Hi Aaron, thank you so much for your reply. I don't know about the variants, but it does not even run, I get an error message.

Here is what I did and what the error message was: I browsed the database with this command: gemini query --header -q "SELECT gene, chrom, start, end FROM variants where impact_severity != 'LOW'" --gt-filter "(gt_types).(Phenotype==2).(==HET).(count >1) and (gt_types).(Phenotype==1).(==HOM_REF).(all) and (gt_alt_freqs).(*).(>=0.3).(any)" gemini.db

And the error message was: Traceback (most recent call last): File "/usr/local/apps/gemini/0.20.1/bin/gemini", line 7, in gemini_main.main() File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_main.py", line 1248, in main args.func(parser, args) File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_main.py", line 439, in query_fn gemini_query.query(parser, args) File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_query.py", line 169, in query run_query(args) File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/gemini_query.py", line 135, in run_query gene_needed, args.show_families, subjects=subjects) File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 622, in run self.gt_filter = self._correct_genotype_filter() File "/usr/local/Anaconda/envs_app/gemini/0.20.1/lib/python2.7/site-packages/gemini/GeminiQuery.py", line 1047, in _correct_genotype_filter raise ValueError("Wildcard filter should consist of 4 elements. Exiting.") ValueError: Wildcard filter should consist of 4 elements. Exiting.

I think it is related to the "." in (>=0.3) since it complians about the number of elements. Any suggestion? Thank you so much, Oskar

OskarSchnappauf avatar Apr 08 '19 15:04 OskarSchnappauf

I encountered the same issue. It issues the "ValueError: Wildcard filter should consist of 4 elements" Also #868 is the same issue. Uma

udp3f avatar Apr 29 '19 16:04 udp3f

This can be fixed by changing the file ....../python2.7/site-packages/gemini/GeminiQuery.py

Line 1043: if token.count('.') != 3 or \ becomes if token.count(').(') != 3 or \

Line 1048: (column, wildcard, wildcard_rule, wildcard_op) = token.split('.') becomes column, wildcard, wildcard_rule, wildcard_op) = token.split(').(')

I have no idea if this breaks other functionalities, so make a backup of the original file.

timothee-revil avatar Aug 06 '19 17:08 timothee-revil