chispa icon indicating copy to clipboard operation
chispa copied to clipboard

ignore_schema in assert_df_equality removed in 9.3?

Open MathiasHolmstrom opened this issue 1 year ago • 6 comments

I used this parameter in 9.2 but it's no longer there in 9.3. Why was this removed and does it mean I can't perform unit-tests without comparing types any longer?

MathiasHolmstrom avatar Sep 18 '23 14:09 MathiasHolmstrom

Yea, we had to remove this because it was a bad addition to the library (it didn't make sense after I thought about it deeper). Can you give me a better idea of what you're trying to accomplish, so I can see if it's possible with chispa or if the library should be modified? Thank you.

MrPowers avatar Sep 26 '23 04:09 MrPowers

If I am comparing two dataframes and don't care about the types of the columns. In that case I want the assert dataframes to pass even if the types are different. Is there another way of accomplishing this behavior?

MathiasHolmstrom avatar Sep 26 '23 08:09 MathiasHolmstrom

@Hiderdk - yea this should work: chispa.assert_basic_rows_equality(df1.collect(), df2.collect()). Let me know if that works for you.

MrPowers avatar Sep 30 '23 01:09 MrPowers

@MrPowers first of all, thanks for the wonderful library. Why did you decide to change the API of this assertion in the minor version bump of the package?

This caused our tests to break, the convention is to rely on the fact that the minor version bumps don't change the API and thus package managers (like poetry) update the version of the dips to the latest minor version.

ivanychev avatar Oct 13 '23 16:10 ivanychev

We used this option a lot because we don't really care of whether the column is IntegerType or LongType but we do want to compare Spark DataFrames and use other comparator options of assert_df_equality. Without it, we will need to conduct some boilerplate type casting in the test code to make tests work again. It's a petty that you decided to remove it.

ivanychev avatar Oct 13 '23 16:10 ivanychev

@ivanychev - yea, I have the work-around that will meet your use case above.

Why did you decide to change the API of this assertion in the minor version bump of the package?

We're using Semantic Versioning 2.0. Per the spec: "Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable."

It's a petty that you decided to remove it.

No, this wasn't petty. This option was causing bugs and breaking workflows. We needed to remove it. I do my best to make all changes backwards compatible. This one absolutely needed to be removed cause it was causing lots of issues.

We used this option a lot because we don't really care of whether the column is IntegerType or LongType but we do want to compare Spark DataFrames and use other comparator options of assert_df_equality

Feel free to propose another abstraction that's not breaking, not buggy, and will be a good addition for the entire chispa community 🚀

MrPowers avatar Oct 13 '23 16:10 MrPowers