ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat(api): move from .case() to .cases()

Open NickCrews opened this issue 1 year ago • 7 comments

WIP. Redo of #7914 on top of the modern code. Changes from that PR:

  • Instead of removing the old .case() APIs, I deprecate them here. There are a few tests for them to try to ensure we didn't break anyone.
  • keep both SimpleCase and SearchedCase. In previous PRs I removed SimpleCase. But we should support SimpleCase for more idiomatic/performant SQL. Eg now we still have CASE x WHEN 5 THEN ...
  • Before, x.cases((x==5, 3)) was allowed. Now it is disallowed, it's too clever. If you want to do that, it's an easy transition to use the top-level ibis.cases()
  • Added many more test cases, including for dtypes, dshapes, and expected errors on bad construction
  • added test that ibis.null().cases((None, "fill"), else_="not hit") always results in "not hit". Maybe not the best ergonomics, but at least it is consistent and written down. Perhaps we can revisit later.
  • Added xfail test where it is possible to construct ibis.cases(("not a bool", "someval")). I think we should fix that upstream in the ops.Literal constructor.
  • fixed bug where datashape was only getting determined from base or cases. Really it needs to depend on ALL inputs.
  • added some tests and implementation for dealing with empty branches: ibis.cases() (results in NULL) and ibis.cases(else_=5) (results in 5). I considered disallowing these, but I don't think there is anything semantiically wrong with supporting this.

TODOs that I found that should come in followups:

  • NULL replacement isn't super consistent yet. For example, val.substitute({None: 4}) currently does a fillna(). But if you do val.cases((None, 4), else_=val), then this ALWAYS hits the else_ case, because x = NULL never evals to True. This also isn't even consistent in the sense that it only special cases for python None. If you do ibis.null(), or something only known at runtime like ibis.literal(5).nullif(5), then this will always hit the else_ case. Due to these limitations, I vote for making matching against NULLs out of scope for .cases and .substitute. If a user wants to do this, then they better do a .fillna() before.

NickCrews avatar Apr 22 '24 22:04 NickCrews

OK I think this is ready for review! I can break this into smaller commits if you want, I think I now have all the semantics figured out how I like them.

NickCrews avatar Apr 23 '24 01:04 NickCrews

Can you keep around a test case the deprecated builder API?

kszucs avatar Apr 24 '24 09:04 kszucs

@kszucs I re-added some simple tests how does that look? I can add the tests for all the edge cases, but I would prefer not to. More of the actual functionality (eg datashape and datatype logic) now happens inside the Operation, which both APIs share, so both APIs should be fairly aligned

NickCrews avatar Apr 24 '24 19:04 NickCrews

pyspark is issuing some ResourceWarning for some unrelated reason, is this some separate bug we should address?

NickCrews avatar Apr 24 '24 20:04 NickCrews

oops, just discovered this is breaking for existing users of Value.cases(). Need to add some compat code in there. ugggh it's gonna be annoying.

NickCrews avatar Apr 24 '24 20:04 NickCrews

OK, probably the most complex thing I do here is the arg, kwarg normalization to maintain compatibility with the old API. I feel pretty good about the way I did it, but I would love some red teaming to try to figure out if there are flaws with it.

NickCrews avatar Apr 24 '24 21:04 NickCrews

Is it a bug that impala is returning a nan instead of None? It sure is annoying to have to do

    if exp is None:
        assert pd.isna(result)
    else:
        assert result == exp

just for impala

NickCrews avatar Apr 24 '24 22:04 NickCrews

@kszucs @cpcloud no way we can get this in for 9.0 is there ? The failing CI is unrelated, I think this thing is ready to go after a rebase

NickCrews avatar Apr 30 '24 15:04 NickCrews

It looks like you've branched into ibis-project. This needs to be recreated as a PR coming from your fork.

cpcloud avatar Apr 30 '24 15:04 cpcloud

I think this is too close to the release to get it in for 9.0. We'll probably need to do another non-major release a week or two afterward, so let's get it in first thing after we release 9.

cpcloud avatar Apr 30 '24 15:04 cpcloud

Closing in in favor of https://github.com/ibis-project/ibis/pull/9096 (which is correctly coming from my fork)

NickCrews avatar May 01 '24 17:05 NickCrews