ibis
ibis copied to clipboard
feat(api): move from .case() to .cases()
WIP. Redo of #7914 on top of the modern code. Changes from that PR:
- Instead of removing the old .case() APIs, I deprecate them here. There are a few tests for them to try to ensure we didn't break anyone.
- keep both SimpleCase and SearchedCase. In previous PRs I removed SimpleCase. But we should support SimpleCase for more idiomatic/performant SQL. Eg now we still have
CASE x WHEN 5 THEN ... - Before,
x.cases((x==5, 3))was allowed. Now it is disallowed, it's too clever. If you want to do that, it's an easy transition to use the top-levelibis.cases() - Added many more test cases, including for dtypes, dshapes, and expected errors on bad construction
- added test that
ibis.null().cases((None, "fill"), else_="not hit")always results in "not hit". Maybe not the best ergonomics, but at least it is consistent and written down. Perhaps we can revisit later. - Added xfail test where it is possible to construct
ibis.cases(("not a bool", "someval")). I think we should fix that upstream in the ops.Literal constructor. - fixed bug where datashape was only getting determined from
baseorcases. Really it needs to depend on ALL inputs. - added some tests and implementation for dealing with empty branches:
ibis.cases()(results in NULL) andibis.cases(else_=5)(results in 5). I considered disallowing these, but I don't think there is anything semantiically wrong with supporting this.
TODOs that I found that should come in followups:
- NULL replacement isn't super consistent yet. For example,
val.substitute({None: 4})currently does a fillna(). But if you doval.cases((None, 4), else_=val), then this ALWAYS hits the else_ case, becausex = NULLnever evals to True. This also isn't even consistent in the sense that it only special cases for pythonNone. If you doibis.null(), or something only known at runtime likeibis.literal(5).nullif(5), then this will always hit the else_ case. Due to these limitations, I vote for making matching against NULLs out of scope for .cases and .substitute. If a user wants to do this, then they better do a .fillna() before.
OK I think this is ready for review! I can break this into smaller commits if you want, I think I now have all the semantics figured out how I like them.
Can you keep around a test case the deprecated builder API?
@kszucs I re-added some simple tests how does that look? I can add the tests for all the edge cases, but I would prefer not to. More of the actual functionality (eg datashape and datatype logic) now happens inside the Operation, which both APIs share, so both APIs should be fairly aligned
pyspark is issuing some ResourceWarning for some unrelated reason, is this some separate bug we should address?
oops, just discovered this is breaking for existing users of Value.cases(). Need to add some compat code in there. ugggh it's gonna be annoying.
OK, probably the most complex thing I do here is the arg, kwarg normalization to maintain compatibility with the old API. I feel pretty good about the way I did it, but I would love some red teaming to try to figure out if there are flaws with it.
Is it a bug that impala is returning a nan instead of None? It sure is annoying to have to do
if exp is None:
assert pd.isna(result)
else:
assert result == exp
just for impala
@kszucs @cpcloud no way we can get this in for 9.0 is there ? The failing CI is unrelated, I think this thing is ready to go after a rebase
It looks like you've branched into ibis-project. This needs to be recreated as a PR coming from your fork.
I think this is too close to the release to get it in for 9.0. We'll probably need to do another non-major release a week or two afterward, so let's get it in first thing after we release 9.
Closing in in favor of https://github.com/ibis-project/ibis/pull/9096 (which is correctly coming from my fork)