ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat: assert_tables_equal()

Open NickCrews opened this issue 1 year ago • 3 comments

Is your feature request related to a problem?

When writing unit tests that use ibis, it is annoying to assert that two tables (including their data!) are equal. It would be super nice if ibis had a function similar to pandas.testing.assert_frames_equal().

I have a simple version of this at https://github.com/NickCrews/mismo/blob/df57a37642edeb41d945a000b1f5a6228b4d72c1/mismo/tests/util.py. Would definitely want to adjust the API, adding lots more options for how to check for type equality, etc. This could get used internally in ibis's tests, but also it should be a public API for users. I figure that if we use it in ibis's internal tests for a while, we can iron out most of the kinks and get a stable API.

Describe the solution you'd like

See above

What version of ibis are you running?

main

What backend(s) are you using, if any?

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

NickCrews avatar Mar 11 '24 19:03 NickCrews

@NickCrews are you aware of table.equals(other_table) and does that suffice?

[ins] In [6]: t.equals(t)
Out[6]: True

I guess this won't actually check that the data is the same based on the docstring though

[ins] In [5]: t.equals?
Signature: t.equals(other)
Docstring:
Return whether this expression is _structurally_ equivalent to `other`.

If you want to produce an equality expression, use `==` syntax.

Parameters
----------
other
    Another expression

Examples
--------
>>> import ibis
>>> t1 = ibis.table(dict(a="int"), name="t")
>>> t2 = ibis.table(dict(a="int"), name="t")
>>> t1.equals(t2)
True
>>> v = ibis.table(dict(a="string"), name="v")
>>> t1.equals(v)
False

lostmygithubaccount avatar Mar 11 '24 19:03 lostmygithubaccount

ahh, that is for abstract structural equality. I am looking for the actual bound data to be the same. Updated my original post to be more clear.

NickCrews avatar Mar 11 '24 20:03 NickCrews

Agree on this: I'm also trying to create unit tests with Ibis tables and I could not figure out how to do it using ibis only (I mean, not relying on the external Pandas df's .equals(), returned by ibis' .produce() method).

Even the definition of "structurally equivalent" in .equals() docstring is too abstract, IMHO. What does the "structurally equivalent" refers to? the entire schema? data types only? does order of columns/rows matter?

From my understanding, this code should give me True instead of False:

import ibis
data = [
    {"name": "Alice", "birthdate": "1990-01-01"},
    {"name": "Bob", "birthdate": "1985-05-15"},
    {"name": "Charlie", "birthdate": "1992-08-23"},
]
schema = ibis.schema(
    {
        "name": "string",
        "birthdate": "date",
    }
)
memtable1 = ibis.memtable(data, schema=schema)

memtable2 = memtable1.rename({"name": "name"}) # we've kept the same structure/schema/types, right?

memtable1.equals(memtable2) # False :/

For the "structurally" part, my understanding is that it's comparing the lineage of transformations (not sure):

> memtable1

InMemoryTable
  data:
    PandasDataFrameProxy:
            name   birthdate
      0    Alice  1990-01-01
      1      Bob  1985-05-15
      2  Charlie  1992-08-23
> memtable2

r0 := InMemoryTable
  data:
    PandasDataFrameProxy:
            name   birthdate
      0    Alice  1990-01-01
      1      Bob  1985-05-15
      2  Charlie  1992-08-23

Project[r0]
  name:      r0.name
  birthdate: r0.birthdate

As I'm used to use chispa for pyspark unit tests, their way of checking dataframe equality looks good enough for me:

assert_df_equality(df1, df2, ignore_column_order=True, ignore_row_order=True)

Note: using ibis-framework==10.1.0

filipeo2-mck avatar Mar 19 '25 15:03 filipeo2-mck