ibis
ibis copied to clipboard
feat: assert_tables_equal()
Is your feature request related to a problem?
When writing unit tests that use ibis, it is annoying to assert that two tables (including their data!) are equal. It would be super nice if ibis had a function similar to pandas.testing.assert_frames_equal().
I have a simple version of this at https://github.com/NickCrews/mismo/blob/df57a37642edeb41d945a000b1f5a6228b4d72c1/mismo/tests/util.py. Would definitely want to adjust the API, adding lots more options for how to check for type equality, etc. This could get used internally in ibis's tests, but also it should be a public API for users. I figure that if we use it in ibis's internal tests for a while, we can iron out most of the kinks and get a stable API.
Describe the solution you'd like
See above
What version of ibis are you running?
main
What backend(s) are you using, if any?
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
@NickCrews are you aware of table.equals(other_table) and does that suffice?
[ins] In [6]: t.equals(t)
Out[6]: True
I guess this won't actually check that the data is the same based on the docstring though
[ins] In [5]: t.equals?
Signature: t.equals(other)
Docstring:
Return whether this expression is _structurally_ equivalent to `other`.
If you want to produce an equality expression, use `==` syntax.
Parameters
----------
other
Another expression
Examples
--------
>>> import ibis
>>> t1 = ibis.table(dict(a="int"), name="t")
>>> t2 = ibis.table(dict(a="int"), name="t")
>>> t1.equals(t2)
True
>>> v = ibis.table(dict(a="string"), name="v")
>>> t1.equals(v)
False
ahh, that is for abstract structural equality. I am looking for the actual bound data to be the same. Updated my original post to be more clear.
Agree on this: I'm also trying to create unit tests with Ibis tables and I could not figure out how to do it using ibis only (I mean, not relying on the external Pandas df's .equals(), returned by ibis' .produce() method).
Even the definition of "structurally equivalent" in .equals() docstring is too abstract, IMHO. What does the "structurally equivalent" refers to? the entire schema? data types only? does order of columns/rows matter?
From my understanding, this code should give me True instead of False:
import ibis
data = [
{"name": "Alice", "birthdate": "1990-01-01"},
{"name": "Bob", "birthdate": "1985-05-15"},
{"name": "Charlie", "birthdate": "1992-08-23"},
]
schema = ibis.schema(
{
"name": "string",
"birthdate": "date",
}
)
memtable1 = ibis.memtable(data, schema=schema)
memtable2 = memtable1.rename({"name": "name"}) # we've kept the same structure/schema/types, right?
memtable1.equals(memtable2) # False :/
For the "structurally" part, my understanding is that it's comparing the lineage of transformations (not sure):
> memtable1
InMemoryTable
data:
PandasDataFrameProxy:
name birthdate
0 Alice 1990-01-01
1 Bob 1985-05-15
2 Charlie 1992-08-23
> memtable2
r0 := InMemoryTable
data:
PandasDataFrameProxy:
name birthdate
0 Alice 1990-01-01
1 Bob 1985-05-15
2 Charlie 1992-08-23
Project[r0]
name: r0.name
birthdate: r0.birthdate
As I'm used to use chispa for pyspark unit tests, their way of checking dataframe equality looks good enough for me:
assert_df_equality(df1, df2, ignore_column_order=True, ignore_row_order=True)
Note: using ibis-framework==10.1.0