StructuredQueries.jl
StructuredQueries.jl copied to clipboard
at-querying a Query
Suppose a user produces a Query
:
qry = @query filter(:src, A > .5) |>
select(B, C)
It seems reasonable that a user ought to be able to extend this query by using it as the source of another query:
qry2 = @query groupby(qry, C)
More specifically, collect
ing against qry2
should have the same result as collect
ing against
@query filter(:src, A > .5) |>
select(B, C) |>
groupby(qry, C)
I can see two ways to achieve this desired behavior:
- At the level of the
Query
object itself, via a constructor:
Query(source::Query, graph::QueryNode)
graph.input = source.graph
Query(source.source, graph)
end
- At the level of
collect
collect(qry::Query, q::QueryNode) = collect(collect(qry), q)
I think I slightly prefer the first way.
would it be qry2 = @query groupby(qry, C)
or qry2 = @query groupby($qry, C)
?
It would be the former. If @query
sees that a manipulation command, e.g. groupby
is not piped an argument, then the macro assumes that the first argument must name a data source, rather than be a query argument. Interpolation is only necessary if the value appears in the context of a query argument, e.g. an expression to be mapped over columns.
Interpolation is only necessary if the value appears in the context of a query argument, e.g. an expression to be mapped over columns.
Might that not be supported in the future?
It will be supported, either as interpolation or something of a "prepared statements" API.
It will be supported, either as interpolation or something of a "prepared statements" API.
Thanks for clarifying, so
qry2 = @query groupby($qry, C)
will result in a "prepared statement", whereas
qry2 = @query groupby(qry, C)
might result in something different?
Oh, I see what you were asking. No, there won't be interpolation or prepared statements for data sources. I see both interpolation and prepared statements as answers to the question, How do I refer to a value outside the "scope" of @query
inside a query argument to a manipulation verb? In my mind, the query argument realm -- i.e., non-data source arguments passed to manipulation verbs like groupby
-- is entirely agnostic about how a data source is specified. Interpolation and prepared statements belong to that realm.
In the case of extending a Query
, one is treating the Query
object as a data source, and so mention of it within @query
does not belong to the realm with which interpolation and prepared statements are concerned. The analogue to "interpolation" behavior for sources is the dummy source functionality.
Maybe a good way to summarize is: Interpolation/prepared statements lets you use different values in the same query, e.g. different values for c
in filter(tbl, A > $c)
. Dummy sources let you collect
the same query against different backends.
That's a good enough distinction for me, thanks!
So it should be
x = 15
tbl1 = # some datasource
tbl2 = # some datasource
...
qry = @query :table1 |> innerjoin(:table2, ...) |> where(table2.col1 > $x)
collect(qry, table1 = tbl1, table2 = tbl2)
rather than
qry = @query :table1 |> innerjoin($tbl2, ...) |> where(table2.col1 > $x)
collect(qry, table1 = tbl1)
?
Interpolation and prepared statements belong to that realm. [...] The analogue to "interpolation" behavior for sources is the dummy source functionality.
How about the following proposal(s):
- for the analog of prepared statements, we introduce "dummy" placeholders (via
:
) to be filled in via keyword args incollect()
(EDIT: orbinding!()
) later on. -
table.column_name
andcolumn_name
are both allowed -
[If supported] for interpolation/splicing (via
$
) to retain the same meaning as they do in Julia MetaProgramming.
Actually, I think all mentions of dummy sources within @query
will require prepending with :
. So it would be
qry = @query :table1 |> innerjoin(:table2, ...) |> where(:table2.col1 > $x)
collect(qry, table1 = tbl1, table2 = tbl2)
I take the table2
without prepending the :
to be a direct reference to the object table2
in the scope in which @query
is invoked. So the above would be equivalent to
@collect tbl1 |> innjerjoin(tbl2, ...) |> where(tbl2.col1 > $x)
whereas
qry = @query :table1 |> innerjoin(:table2, ...) |> where(table2.col1 > $x)
collect(qry, table1 = tbl1, table2 = tbl2)
would be equivalent to
@collect tbl1 |> innerjoin(tbl2, ...) |> where(table2.col1 > $x)
An alternative to having to repeatedly prepend :
would be using an alias:
qry = @query begin
tbl = :table2
table1 |> innerjoin(tbl, ...) |> where(tbl.col1 > $x)
end
As for your proposals, here are my thoughts:
for the analog of prepared statements, we introduce "dummy" placeholders (via :) to be filled in via keyword args incollect() later on.
I can see why you want to unify the dummy source and prepared statements functionalities, but I do like having the syntax reflect the conceptual distinction between collecting a (fixed) query against different sources and collecting a prepared query with varying parameter values against a fixed source. One may want (I don't exactly know why, but I don't see why we shouldn't support it) to bind different values to a parametrized (prepared) Query
without collect
ing it, in which case the binding mechanism ought to be different than collect
-- e.g. something like
qry = @query tbl |>
filter(A > c::Int) |>
select(B)
for _c in [1, 2, 3]
bind!(qry, c = _c)
do_something(qry)
end
end
The second point concerning using :
for both dummy sources and parametrized queries is that I think the syntax for the latter may need to include some way of specifying the type of values that the parameter will take. Though it's possible that this won't be necessary, and that we will be able to place function barriers inside the collect
machinery for column-indexable tabular data structures in such a way that allows type inference to figure out what's going on when we map, say, a filtering lambda over not only a tuple of columns but also over query parameters.
Finally, there's an argument against using :
to signify interpolation that applies equally to using :
to designate query parameters, and that is that it renders the user unable to talk about Symbol
literals in query arguments. For instance, tbl[:A]
may be a column of Symbol
objects, but if :
denotes a query parameter then you can't naively express the query "select the subset of rows of tbl
where the A
attribute is equal to :a
" with
@collect filter(tbl, A == :a)
-- you'd have to do
qry = @query filter(tbl, A == :a)
collect(tbl, a = :a) # or, as I'd prefer, `bind`
which I'm not really a fan of.
Note that the dummy source functionality doesn't run into this problem because, within @query
, sources don't appear within query arguments. Now, if we use :
to designate dummy sources and allow :alias.attribute
as an identifier, then :alias
does appear within a query argument. However, it is very distinguishable from a literal Symbol
argument, since, as an object in a Julia AST, :alias.attribute
is not a Symbol
literal but rather an Expr
with head :.
, and which may be parsed appropriately.
table.column_name
andcolumn_name
are both allowed
Yes, and we will provide a definitive way to communicate to which source an un-prefixed column_name
is to belong.
[If supported] for interpolation/splicing (via $) to retain the same meaning as they do in Julia MetaProgramming.
This tentatively sounds good, too -- I'll need to reason through this a bit more and see if it makes sense, since technically one is not interpolating into an Expr
object as one does in Julia metaprogramming. But I agree with the spirit of this suggestion.
Also, I'll add that if I had to choose $
for use in either a prepared statements API or an interpolation API, I think I'd opt for the former, since I think it will accomplish what folks wish to do with the latter, but more efficiently.