bayeslite icon indicating copy to clipboard operation
bayeslite copied to clipboard

BQL does not include IN expression

Open marcoct opened this issue 9 years ago • 5 comments

This command is not supported

bdb.execute('select * from satellites where Purpose IN ("Communications", "Reconnaissance")')
  File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/bayesdb.py", line 149, in execute
  File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/parse.py", line 104, in <genexpr>
  File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/parse.py", line 91, in <genexpr>
  File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/parse.py", line 77, in parse_bql_phrases
bayeslite.parse.BQLParseError: Parse error: syntax error near Communications

But with sql_execute it is:

bdb.sql_execute('select * from satellites where Purpose IN ("Communications", "Reconnaissance")

In the longer-tmer, is the intention for SQL to co-exist alongside BQL with a separate entry point? Is there a plan for extending BQL to implement a superset of SQL syntax?

In the short-term, are we intending for users to use sql_execute() as well as execute()? Is there a list of SQL features that are not currently supported by BQL somewhere?

marcoct avatar Oct 18 '15 23:10 marcoct

This feature would be helpful to run things like this:

estimate dependence probability as dp from pairwise columns of satellites_cc where name0 IN ("Ant
icipated_Lifetime", "Contractor") AND name1 IN ("Purpose", "Eccentricity") 

Note that separately, for the above to be run, the user must be aware of the field names "name0" and "name1", which are (I think) currently undocumented.

marcoct avatar Oct 18 '15 23:10 marcoct

What is missing is IN with a collection of literals. This shouldn't be too hard to add, if anyone would like to take a stab at it:

  1. Add an AST type to ast.py for literal IN expressions, alongside the query IN expressions.
  2. Add a grammar rule or two to grammar.y, for the equality symbol.
  3. Add parse actions to parse.py for these grammar rules to make the AST nodes.
  4. Add a case to compile_expression in compiler.py to compile it to SQL.
  5. Add some test cases.

Each of these changes should be maybe three or four lines -- not hard, and a worthwhile tour of the parts of the BQL->SQL compiler.

riastradh-probcomp avatar Oct 19 '15 16:10 riastradh-probcomp

On documentation of columns in the fancy BQL queries: yes, that should go in the cheat sheet we discussed, until someone either determines that the current state of affairs is worth committing to or invents a better one.

On sql_execute: It is not going away until all of SQL is added to BQL, which may take a long time.

riastradh-probcomp avatar Oct 19 '15 16:10 riastradh-probcomp

Labeling "help wanted" because the issue now seems to contain enough material for a new contributor to use it as a starter project.

axch avatar Oct 19 '15 21:10 axch

Question: do we actually compute the values correctly for this, even if the explicit lists were supported?

Would we want to revisit resampling for the restricted population, and running some analysis? Or giving the user the option to do that, especially if we're using a small sample of the available data for most of our analysis?

estimate dependence probability as dp from pairwise columns of satellites_cc where name0 IN ("Anticipated_Lifetime", "Contractor") AND name1 IN ("Purpose", "Eccentricity")

gregory-marton avatar Nov 19 '15 20:11 gregory-marton