ibis icon indicating copy to clipboard operation
ibis copied to clipboard

fix: add order_by to first and last

Open mesejo opened this issue 4 months ago • 11 comments

Description of changes

N/A

Issues closed

  • Resolves #11656

mesejo avatar Oct 16 '25 20:10 mesejo

@mesejo Is there a test you can add to ensure this doesn't regress?

cpcloud avatar Oct 17 '25 12:10 cpcloud

@cpcloud, Thanks for the review, just added a test

mesejo avatar Oct 17 '25 19:10 mesejo

@mesejo I just updated the test, let me know if that is acceptable to you! I think using the handcrafted data will make it more clear what we are testing. Also added tests for descending order bys

NickCrews avatar Oct 17 '25 19:10 NickCrews

LGTM

NickCrews avatar Oct 17 '25 19:10 NickCrews

@mesejo I just updated the test, let me know if that is acceptable to you! I think using the handcrafted data will make it more clear what we are testing. Also added tests for descending order bys

That's fine with me. Thanks!

mesejo avatar Oct 17 '25 20:10 mesejo

@mesejo any idea why druid is failing on my version? On your version, the commit right before, it passed.

NickCrews avatar Oct 17 '25 20:10 NickCrews

I'm guessing because in your version you only operated over a single column, where in mine we are ordering by a different column than we are aggregating over? Probably just mark as xfail and call it a day?

NickCrews avatar Oct 17 '25 20:10 NickCrews

@NickCrews, I think it is because of the memtable. The following works for the Druid backend:

@pytest.mark.parametrize(
    "method,expected",
    [
        pytest.param(lambda col: col.first(order_by="bigint_col"), 0, id="first_asc"),
        pytest.param(lambda col: col.last(order_by="bigint_col"), 9, id="last_asc"),
        pytest.param(
            lambda col: col.first(order_by=ibis._.bigint_col.desc()), 9, id="first_desc"
        ),
        pytest.param(
            lambda col: col.last(order_by=ibis._.bigint_col.desc()), 0, id="last_desc"
        ),
    ],
)
def test_first_last_ordered_in_mutate(alltypes, con, method, expected):
    # originally reported in https://github.com/ibis-project/ibis/issues/11656
    t = alltypes
    expr = t.mutate(new=method(t.int_col))
    actual = con.to_pyarrow(expr.new).to_pylist()
    assert actual == [expected] * len(actual)

mesejo avatar Oct 17 '25 20:10 mesejo

huh, good detective work, thanks for cleaning up my mess. That is weird, since that seems like such basic usage of memtable, and memtable doesn't appear to be broken otherwise on druid.

I don't love how the test data is so far away from the test (like I have no idea if 0 and 9 are actually the right results without looking up the test data).

But I'm willing to say good enough to get this fix in, but @cpcloud may feel differently.

NickCrews avatar Oct 17 '25 22:10 NickCrews

@NickCrews, I changed the test to:

t = alltypes.select(
    a=ibis._.tinyint_col, val=ibis._.int_col, ob=ibis._.bigint_col
).filter(
    ((ibis._.val == 4) & (ibis._.ob == 40))
    | ((ibis._.val == 5) & (ibis._.ob == 50))
)
expr = t.mutate(new=method(t.val)).limit(10)
actual = con.to_pyarrow(expr.new).to_pylist()
assert actual == [expected] * 10 

This form makes it easy to reason about the shape and content of the data.

mesejo avatar Oct 18 '25 08:10 mesejo

That is better, thanks. still not ideal as handwritten data, but if we can't have that, then this really helps

NickCrews avatar Oct 18 '25 14:10 NickCrews

thanks for submitting a fix!

still not ideal as handwritten data

sure, but this fixes what i'd consider a critical bug (i.e. one where ibis silently gives incorrect results) - any chance it could be merged please, even if the test cases aren't 100% ideal? 🙏

MarcoGorelli avatar Dec 20 '25 10:12 MarcoGorelli