FunSQL.jl icon indicating copy to clipboard operation
FunSQL.jl copied to clipboard

bound references are not rebased

Open xitology opened this issue 4 years ago • 8 comments

using FunSQL: SQLTable, From, Define, Where, Get, render

person = SQLTable(:person, :person_id, :year_of_birth, :location_id)

ByYOB(yob) =
    (q = Define(:yob => Get.year_of_birth)) |>
    Where(q.yob .== yob)

q = From(person) |>
    ByYOB(1950)

print(render(q))
#=>
ERROR: GetError: cannot find yob in:
let person = SQLTable(:person, …),
    q1 = From(person),
    q2 = q1 |> Define(Get.year_of_birth |> As(:yob)),
    q3 = Define(Get.year_of_birth |> As(:yob)),
    q4 = q2 |> Where(Fun."=="(q3.yob, Lit(1950)))
    q4
end
=#

xitology avatar Sep 20 '21 13:09 xitology

Just ran into this issue. If I understand correctly, ~binary operators in programming languages are typically left associative~ (associativity is not relevant, what matters is if we store relationships in the parent node, or the child node), so rebasing q >> Where(q.yob ...) produces a copy of the node q to put in the final query tree, while the original q is left detached.

It is an interesting conundrum. When the bound reference is created, q could also store a reference to Get.yob, to go back and update it whenever q is rebased. However, this makes tabular nodes stateful which might be problematic when reusing nodes. Do you see any other way around it?

ananis25 avatar Jun 30 '22 12:06 ananis25

So, this error is not limited to bound references. Consider the following example.

table = SQLTable(:table, columns=[:x, :y])
function self_join()
    q = Where(Get.x .== 100)
    return q |> Join(q |> As(:alias), on=Get.x .== Get.alias.x) |> Select(Get.y)
end

query = From(table) |> self_join()
print(render(query))

# ERROR: FunSQL.ReferenceError: cannot find x in:
# let table = SQLTable(:table, …),
#     q1 = From(table),
#     q2 = Get.x,
#     q3 = Lit(100),
#     q4 = Fun."=="(q2, q3),
#     q5 = q1 |> Where(q4),
#     q6 = Where(q4),
#     q7 = q5 |> Join(q6 |> As(:alias), Fun."=="(Get.x, Get.alias.x)),
#     q8 = q7 |> Select(Get.y)
#     q8
# end

So, the issue really seems to be that FunSQL queries are not trees but DAGs. When we rebase the query pipeline A-B-C, we leave node E detached. dags_not_trees

I haven't found a nice way to solve this yet.

  1. Tracking bound references with a tabular node as parent in the parent isn't sufficient, since Join nodes see the same error.

  2. The simple solution is to make the query tree mutable and make rebase an operation that just descends down the line of parent nodes, and swaps the null value at the end. That is, X |> (A |> B |> C) just replaces the parent node for A.

    However, now we need to ensure query fragments are never reused, which is not nice. The neat thing about FunSQL is that the data structures are kind of immutable, and can be freely composed.

  3. Visit and collect all SQLNodes in the query tree that need to be rebased; go over them in the topological order, i.e A, B, E, C in the diagram and make a copy of each.

    Given each SQLNode has multiple attributes that could be nodes themselves, this would be an expensive and tedious.

What do you think?

ananis25 avatar Jul 06 '22 15:07 ananis25

It is an interesting conundrum. When the bound reference is created, q could also store a reference to Get.yob, to go back and update it whenever q is rebased. However, this makes tabular nodes stateful which might be problematic when reusing nodes. Do you see any other way around it?

This is the place where FunSQL is a bit inconsistent. In general, FunSQL ignores the node identity, which means that any node in a query can be replaced with its structural copy. However, this is not valid when the node has a bound reference. In particular, a bound reference can be invalidated by query composition (|>) because composition rewrites its right argument.

q = a |> b |> Where(b.x)

Here, composition a |> b deconstructs the original object b, which invalidates the reference b.x.

This issue can be fixed by having composition preserve a reference to the original node, but is it worth it considering that the problem is pretty obscure? An easy workaround is to rewrite the query like this:

b = a |> b
q = b |> Where(b.x)

xitology avatar Jul 11 '22 17:07 xitology

Hmm, that is a good point; it indeed is not a common occurrence. Though it goes against the FunSQL idea a bit, that sql queries written in modular functions can be composed freely.

For example, I can imagine people factoring out self-joins as a separate function. If the output node from self_join ever gets rebased, the right hand side of the join will be left detached.

function self_join(q, col_name)
    return q |> Join(q |> As(:alias), on=Get(col_name) .== Get.alias >> Get(col_name))
end

The solution is making sure users aware how rebase works, which I feel is an abstraction leak.

ananis25 avatar Jul 12 '22 18:07 ananis25

This issue can be fixed by having composition preserve a reference to the original node

The downside, besides implementation time, is added maintenance due to more complex code base?

clarkevans avatar Jul 12 '22 19:07 clarkevans

So, this error is not limited to bound references. Consider the following example.

table = SQLTable(:table, columns=[:x, :y])
function self_join()
    q = Where(Get.x .== 100)
    return q |> Join(q |> As(:alias), on=Get.x .== Get.alias.x) |> Select(Get.y)
end

query = From(table) |> self_join()
print(render(query))

# ERROR: FunSQL.ReferenceError: cannot find x in:
# let table = SQLTable(:table, …),
#     q1 = From(table),
#     q2 = Get.x,
#     q3 = Lit(100),
#     q4 = Fun."=="(q2, q3),
#     q5 = q1 |> Where(q4),
#     q6 = Where(q4),
#     q7 = q5 |> Join(q6 |> As(:alias), Fun."=="(Get.x, Get.alias.x)),
#     q8 = q7 |> Select(Get.y)
#     q8
# end

So, the issue really seems to be that FunSQL queries are not trees but DAGs. When we rebase the query pipeline A-B-C, we leave node E detached. dags_not_trees

I haven't found a nice way to solve this yet.

1. Tracking bound references with a tabular node as parent in the parent isn't sufficient, since `Join` nodes see the same error.

2. The simple solution is to make the query tree _mutable_ and make `rebase` an operation that just descends down the line of parent nodes, and swaps the null value at the end. That is, `X |> (A |> B |> C)` just replaces the parent node for A.
   However, now we need to ensure query fragments are never reused, which is not nice. The neat thing about FunSQL is that the data structures are kind of immutable, and can be freely composed.

3. Visit and collect all SQLNodes in the query tree that need to be rebased; go over them in the topological order, i.e `A, B, E, C` in the diagram and make a copy of each.
   Given each SQLNode has multiple attributes that could be nodes themselves, this would be an expensive and tedious.

What do you think?

You cannot construct this DAG by reusing the object q. This is because FunSQL ignores the identity of the node objects (bound references are an exception), which means any node in a query could be replaced with a copy without changing the meaning of the query. In your example, it means you could equivalently write

q1 = Where(Get.x .== 100)
q2 = Where(Get.x .== 100)
return q1 |> Join(q2 |> As(:alias), on=Get.x .== Get.alias.x) |> Select(Get.y)

and it must produce exactly the same query.

One way to solve this is by moving the variable q to the FunSQL level, that is, using With to create a tabular "variable":

From(:q) |>
Join(From(:q) |> As(:alias), on = Get.x .== Get.alias.x)) |>
Select(Get.y) |>
With(From(table) |> Where(Get.x .== 100) |> As(:q))

This does not yet let you write a self_join() function. To make self_join() possible, we need a variant With′ (need a better name) of With that exchanges the order of arguments: q1 |> With(q2) is equivalent to q2 |> With′(q1). Then you can write

self_join() =
    Where(Get.x .== 100) |>
    As(:q) |>
    With′(
        From(:q) |>
        Join(From(:q) |> As(:alias), on = Get.x .== Get.alias.x) |>
        Select(Get.y))

From(table) |> self_join()

xitology avatar Jul 14 '22 12:07 xitology

You cannot construct this DAG by reusing the object q. This is because FunSQL ignores the identity of the node objects (bound references are an exception), which means any node in a query could be replaced with a copy without changing the meaning of the query. In your example, it means you could equivalently write

Right, would you consider "a node can be replaced with a structural copy" to be an implementation detail?

It definitely makes query tree transformations easy and error surface smaller by making nodes immutable, though I would have liked the self-join query to work :).

ananis25 avatar Jul 14 '22 14:07 ananis25

Right, would you consider "a node can be replaced with a structural copy" to be an implementation detail?

This is just a different way to say algebraic or referentially transparent.

xitology avatar Jul 14 '22 19:07 xitology

Bound references are no longer supported in FunSQL v0.13.0.

xitology avatar Feb 26 '24 15:02 xitology