fix: add order_by to first and last
Description of changes
N/A
Issues closed
- Resolves #11656
@mesejo Is there a test you can add to ensure this doesn't regress?
@cpcloud, Thanks for the review, just added a test
@mesejo I just updated the test, let me know if that is acceptable to you! I think using the handcrafted data will make it more clear what we are testing. Also added tests for descending order bys
LGTM
@mesejo I just updated the test, let me know if that is acceptable to you! I think using the handcrafted data will make it more clear what we are testing. Also added tests for descending order bys
That's fine with me. Thanks!
@mesejo any idea why druid is failing on my version? On your version, the commit right before, it passed.
I'm guessing because in your version you only operated over a single column, where in mine we are ordering by a different column than we are aggregating over? Probably just mark as xfail and call it a day?
@NickCrews, I think it is because of the memtable. The following works for the Druid backend:
@pytest.mark.parametrize(
"method,expected",
[
pytest.param(lambda col: col.first(order_by="bigint_col"), 0, id="first_asc"),
pytest.param(lambda col: col.last(order_by="bigint_col"), 9, id="last_asc"),
pytest.param(
lambda col: col.first(order_by=ibis._.bigint_col.desc()), 9, id="first_desc"
),
pytest.param(
lambda col: col.last(order_by=ibis._.bigint_col.desc()), 0, id="last_desc"
),
],
)
def test_first_last_ordered_in_mutate(alltypes, con, method, expected):
# originally reported in https://github.com/ibis-project/ibis/issues/11656
t = alltypes
expr = t.mutate(new=method(t.int_col))
actual = con.to_pyarrow(expr.new).to_pylist()
assert actual == [expected] * len(actual)
huh, good detective work, thanks for cleaning up my mess. That is weird, since that seems like such basic usage of memtable, and memtable doesn't appear to be broken otherwise on druid.
I don't love how the test data is so far away from the test (like I have no idea if 0 and 9 are actually the right results without looking up the test data).
But I'm willing to say good enough to get this fix in, but @cpcloud may feel differently.
@NickCrews, I changed the test to:
t = alltypes.select(
a=ibis._.tinyint_col, val=ibis._.int_col, ob=ibis._.bigint_col
).filter(
((ibis._.val == 4) & (ibis._.ob == 40))
| ((ibis._.val == 5) & (ibis._.ob == 50))
)
expr = t.mutate(new=method(t.val)).limit(10)
actual = con.to_pyarrow(expr.new).to_pylist()
assert actual == [expected] * 10
This form makes it easy to reason about the shape and content of the data.
That is better, thanks. still not ideal as handwritten data, but if we can't have that, then this really helps
thanks for submitting a fix!
still not ideal as handwritten data
sure, but this fixes what i'd consider a critical bug (i.e. one where ibis silently gives incorrect results) - any chance it could be merged please, even if the test cases aren't 100% ideal? 🙏