ibis fix: add order_by to first and last

Description of changes

N/A

Issues closed

Resolves #11656

Oct 16 '25 20:10 mesejo

@mesejo Is there a test you can add to ensure this doesn't regress?

Oct 17 '25 12:10 cpcloud

@cpcloud, Thanks for the review, just added a test

Oct 17 '25 19:10 mesejo

@mesejo I just updated the test, let me know if that is acceptable to you! I think using the handcrafted data will make it more clear what we are testing. Also added tests for descending order bys

Oct 17 '25 19:10 NickCrews

LGTM

Oct 17 '25 19:10 NickCrews

@mesejo I just updated the test, let me know if that is acceptable to you! I think using the handcrafted data will make it more clear what we are testing. Also added tests for descending order bys

That's fine with me. Thanks!

Oct 17 '25 20:10 mesejo

@mesejo any idea why druid is failing on my version? On your version, the commit right before, it passed.

Oct 17 '25 20:10 NickCrews

I'm guessing because in your version you only operated over a single column, where in mine we are ordering by a different column than we are aggregating over? Probably just mark as xfail and call it a day?

Oct 17 '25 20:10 NickCrews

@NickCrews, I think it is because of the memtable. The following works for the Druid backend:

@pytest.mark.parametrize(
    "method,expected",
    [
        pytest.param(lambda col: col.first(order_by="bigint_col"), 0, id="first_asc"),
        pytest.param(lambda col: col.last(order_by="bigint_col"), 9, id="last_asc"),
        pytest.param(
            lambda col: col.first(order_by=ibis._.bigint_col.desc()), 9, id="first_desc"
        ),
        pytest.param(
            lambda col: col.last(order_by=ibis._.bigint_col.desc()), 0, id="last_desc"
        ),
    ],
)
def test_first_last_ordered_in_mutate(alltypes, con, method, expected):
    # originally reported in https://github.com/ibis-project/ibis/issues/11656
    t = alltypes
    expr = t.mutate(new=method(t.int_col))
    actual = con.to_pyarrow(expr.new).to_pylist()
    assert actual == [expected] * len(actual)

Oct 17 '25 20:10 mesejo

huh, good detective work, thanks for cleaning up my mess. That is weird, since that seems like such basic usage of memtable, and memtable doesn't appear to be broken otherwise on druid.

I don't love how the test data is so far away from the test (like I have no idea if 0 and 9 are actually the right results without looking up the test data).

But I'm willing to say good enough to get this fix in, but @cpcloud may feel differently.

Oct 17 '25 22:10 NickCrews

@NickCrews, I changed the test to:

t = alltypes.select(
    a=ibis._.tinyint_col, val=ibis._.int_col, ob=ibis._.bigint_col
).filter(
    ((ibis._.val == 4) & (ibis._.ob == 40))
    | ((ibis._.val == 5) & (ibis._.ob == 50))
)
expr = t.mutate(new=method(t.val)).limit(10)
actual = con.to_pyarrow(expr.new).to_pylist()
assert actual == [expected] * 10

This form makes it easy to reason about the shape and content of the data.

Oct 18 '25 08:10 mesejo

That is better, thanks. still not ideal as handwritten data, but if we can't have that, then this really helps

Oct 18 '25 14:10 NickCrews

thanks for submitting a fix!

still not ideal as handwritten data

sure, but this fixes what i'd consider a critical bug (i.e. one where ibis silently gives incorrect results) - any chance it could be merged please, even if the test cases aren't 100% ideal? 🙏

Dec 20 '25 10:12 MarcoGorelli