chispa icon indicating copy to clipboard operation
chispa copied to clipboard

Handle nested nullability

Open machielg opened this issue 3 years ago • 4 comments

When using ignore_nullable=True chispa still sees differences in ArrayType because there's a nullable difference in the inner type:

StructField(my_arr_col,ArrayType(StringType,false),false) StructField(my_arr_col,ArrayType(StringType,true),true)

machielg avatar Mar 29 '22 11:03 machielg

yeah I'm having the same problem. I've had to abandon this library when testing dataframe equality with nested/complex datatypes.

etlundquist avatar May 19 '22 22:05 etlundquist

@machielg Is this still an issue? I see that the below test shows both schemas are equal, returning true

as1 = StructType([StructField("ar", ArrayType(StringType(), False), False)])
as2 = StructType([StructField("ar", ArrayType(StringType(), True), True)])
print(are_schemas_equal_ignore_nullable(as1, as2))

True

orcascope avatar Oct 23 '22 16:10 orcascope

But if the elementType of the ArrayType is a complex StructType, then the nullability differences are still considered. The below check returns False.

def test_schema_nullability_insensitive_comparisons_with_arrays():
    s1 = StructType([StructField("f1", ArrayType(IntegerType(), True), True),
                     StructField("f2", ArrayType(
                         StructType([StructField("latlong", IntegerType(), False),
                         StructField("price", ArrayType(IntegerType(), False), False)]), True), True)])

    s2 = StructType([StructField("f1", ArrayType(IntegerType(), True), True),
                     StructField("f2", ArrayType(
                         StructType([StructField("latlong", IntegerType(), True),
                         StructField("price", ArrayType(IntegerType(), True), True)]), True), True)])

    print(are_schemas_equal_ignore_nullable(s1, s2))

@MrPowers Please check if the string equality check can be used in this case as in https://github.com/orcascope/chispa/commit/ea5d61cf01c44bc6a9f7436bc4c54ae6d622dcd2

orcascope avatar Oct 23 '22 16:10 orcascope