ibis
ibis copied to clipboard
test: add test for impure function correlation behavior
Related to https://github.com/ibis-project/ibis/issues/8921, trying to write down exactly what the expected behavior is.
I figure we can use this PR to hash out exaclty what we want the semantics to be, and then the other discussions might be easier because our goal is written down precisely somewhere. Please let me know if you agree or disagree with this behavior, or if there are other tests we should add.
Need to fix the UDF test case in a followup. Also wasn't sure where to put these tests, I put them in their own little file but if you point me elsewhere Iwill move them.
oooh, these CI failures are revealing differences in the backend behaviors... wheeee into the 🐰 🕳️ we go!
I can think of two reasons a user might care about the semantics here, and I hope we can support both of their needs:
- impureness/correlatedness. If a function is impure, they may want it to be executed once, or many times, depending on if they want them to be correlated. See this example
- performance. If some computation is slow, they only want it to happen a single time.
So I think this means that we as ibis authors can't assume what the goals of the user is, and how many times they want an expression executed. Therefore, we shouldn't do any clever rewrites or mergings of selects. I think we need to keep a more 1:1 correspondence between what the user writes and the SQL we produce: Every time the user does a .select(), .mutate(), etc, (except for simple column renamings, and maybe a few other cases) that leads to exactly one more with select .... as ...
in the generated SQL.
IDK, what do you think of this train of reasoning? I think I'm fairly convinced that those two use cases are the requirements for success, but perhaps there is a different/better way of accomplishing that goal
@NickCrews please rebase to test with https://github.com/ibis-project/ibis/pull/9023 change included
@NickCrews My only objection would be that
performance. If some computation is slow, they only want it to happen a single time.
Is not something we can enforce even if we never merge any select statements. This kind of guarantee is at the level of the query engine.
@cpcloud yup you are right with guarantees, "suggesting" to the backend is the best we can do.
What do you think in general of my proposal of "one CTE per .select"? I'm not sure if you're skeptical of the whole thing or just the performance claims... Thanks!
I'm trying to decide what is higher priority:
An implementation that is faster 90% of the time, but does clever things and therefore isn't able to be tuned by the user that 10% of the time they need it
Vs
An implementation that is a bit slower in the majority of cases, but is always fine tunable to get the perf you need in the edge cases.
slowly going through the backends and adding the correct marks for each kind of failure...
@cpcloud @kszucs I think this is ready for review whenever you get the chance! I think this is the groundwork for defining the current state, and after we get this in then we can start talking about what we think ideal behavior should be, and how to get there.
Anything I can do here to help move this forward?
@NickCrews Can you write a docstring for each of these tests explaining what's being tested in each? I know that's atypical, but this is a very hairy problem with lots of specifics that are important to understand, and words like "impure" and "correlation" need to be precisely defined so that everyone is on the same page about exactly what is being tested.
I think that is a good idea. Will do.
oops, I just wrote the notes as comments, not docstrings, I assume that is still OK? I think this is ready to review now, thanks!
@cpcloud this is ready for a review whenever you get the chance! (assuming CI is still green after the rebase I just did)
The nix CI failure looks like a flake, but IDK what is going on with postgres, any idea? just mark as notimpl?
I'm not sure what you mean by flake here. The new test you added for impure function behavior is breaking in some strange way, which is something we have discussed in other issues as a likely consequence of depending on state outside the function, in this case generating random numbers, which almost certainly involves operating system state.
I'm pretty much ready to declare bankruptcy for now on any guarantees for any impure functions runs as UDFs as well as on guarantees about the semantics of rand()
. People are pretty much on their own right now if they're expecting something resembling reasonable behavior from stateful UDFs.
I would still love to enforce the test_impure_correlated
test and the test_chained_selections
test, so we can enforce that
table.select(common=ibis.random()).select(x="common", y="common")` works as expected, on as many backends as possible. This is a real pain point for me, currently I have to .cache() the tables in intermediate stages to get this guarantee, and I'm scared I'm going to forget to do it sometime. It appears that this will never work on Trino or sqlite, since they don't follow the SQL spec. Is the main issue here that we want ibis to give the same behavior, no matter the backend you use? That seems unachievable, but are there any other concerns here that I could try to mitigate?
The test_impure_uncorrelated
test I am happy to drop. I'm not sure that is part of the SQL standard, and anyway it is not nearly so obvious what select RANDOM() as r1, RANDOM() as r2
is supposed to give.
- If users want guaranteed correlation, they can do a multi-step CTE as above:
t.select(r1=ibis.random()).mutate(r2="r1")
- If users want guaranteed uncorrelation, they can do a multi-step CTE as
t.select(r1=ibis.random()).mutate(r2=ibis.random())
(or perhaps I'm mistaken, and even this won't guarantee uncorrelation on some backends??)
This is related to https://github.com/ibis-project/ibis/pull/9065, where we decided to merge selects by default, but provide an option for not merging them. I think we need to never merge them, and remove that option.
I think we can keep the same merging strategy we have now, which is based on a simple notion of complexity comparison, and treat impure (random, uuid) and potentially impure (UDFs) functions as maximally complex which will prevent projections with those expressions from being merged.
@NickCrews I think I've managed to come up with a decent compromise, which is to treat ops.Impure
instances as having maximum complexity, and thus prevent merging selects with those operations while retaining the ability to merge pure-function-only selects.
Doing this exposed a few bugs too, which I've fixed here.
Nice job finding those bugs and fixing them! Looks great. I was surprised when I had to make duckdb as xfail, I thought these semantics they would get right.
Reminder to myself: I want to also add other possibly impure functions to test_impure_correlated
, such as Array.collect()
and .arbitrary()
.
I think that compromise using complexity works for me. If it were me, I think I would opt for not merging, (more complex SQL, but simpler implementation and guaranteed as-correct-as-possible), but I'll live with it. For example, if we add more impure functions in the future, it would be very possible to accidentally not inherit from ops.Impure
, and we would have a bug.
Collect and arbitrary are not impure. Whatever non determinism there is you have to address in some other way.
...and the current way to do that is with the order_by keyword argument that landed in 9.3.