bayeslite
bayeslite copied to clipboard
BQL does not include IN expression
This command is not supported
bdb.execute('select * from satellites where Purpose IN ("Communications", "Reconnaissance")')
File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/bayesdb.py", line 149, in execute
File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/parse.py", line 104, in <genexpr>
File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/parse.py", line 91, in <genexpr>
File "build/bdist.macosx-10.9-x86_64/egg/bayeslite/parse.py", line 77, in parse_bql_phrases
bayeslite.parse.BQLParseError: Parse error: syntax error near Communications
But with sql_execute it is:
bdb.sql_execute('select * from satellites where Purpose IN ("Communications", "Reconnaissance")
In the longer-tmer, is the intention for SQL to co-exist alongside BQL with a separate entry point? Is there a plan for extending BQL to implement a superset of SQL syntax?
In the short-term, are we intending for users to use sql_execute() as well as execute()? Is there a list of SQL features that are not currently supported by BQL somewhere?
This feature would be helpful to run things like this:
estimate dependence probability as dp from pairwise columns of satellites_cc where name0 IN ("Ant
icipated_Lifetime", "Contractor") AND name1 IN ("Purpose", "Eccentricity")
Note that separately, for the above to be run, the user must be aware of the field names "name0" and "name1", which are (I think) currently undocumented.
What is missing is IN
with a collection of literals. This shouldn't be too hard to add, if anyone would like to take a stab at it:
- Add an AST type to
ast.py
for literalIN
expressions, alongside the queryIN
expressions. - Add a grammar rule or two to
grammar.y
, for theequality
symbol. - Add parse actions to
parse.py
for these grammar rules to make the AST nodes. - Add a case to
compile_expression
incompiler.py
to compile it to SQL. - Add some test cases.
Each of these changes should be maybe three or four lines -- not hard, and a worthwhile tour of the parts of the BQL->SQL compiler.
On documentation of columns in the fancy BQL queries: yes, that should go in the cheat sheet we discussed, until someone either determines that the current state of affairs is worth committing to or invents a better one.
On sql_execute
: It is not going away until all of SQL is added to BQL, which may take a long time.
Labeling "help wanted" because the issue now seems to contain enough material for a new contributor to use it as a starter project.
Question: do we actually compute the values correctly for this, even if the explicit lists were supported?
Would we want to revisit resampling for the restricted population, and running some analysis? Or giving the user the option to do that, especially if we're using a small sample of the available data for most of our analysis?
estimate dependence probability as dp from pairwise columns of satellites_cc where name0 IN ("Anticipated_Lifetime", "Contractor") AND name1 IN ("Purpose", "Eccentricity")