chispa icon indicating copy to clipboard operation
chispa copied to clipboard

Add option to ingore order of array items (but check schema of array items)

Open quickdraw6906 opened this issue 10 months ago • 1 comments

If structure of array items differs only in the ordering of columns chispa still throws SchemasNotEqualError when ignore_column_order is passed

chispa.assert_df_equality(merge_df, expected_df, ignore_row_order=True, ignore_column_order=True)

chispa.schema_comparer.SchemasNotEqualError:

| StructField('representatives', ArrayType(StructType([StructField('updated_at', StringType(), True), StructField('id', StringType(), True), StructField('aa_id', StringType(), True), StructField('rep_number', DoubleType(), True), StructField('person_info', StructType([StructField('first_name', StringType(), True), StructField('last_name', StringType(), True), StructField('email', StringType(), True)]), True)]), False), False) | StructField('representatives', ArrayType(StructType([StructField('id', StringType(), True), StructField('aa_id', StringType(), True), StructField('rep_number', DoubleType(), True), StructField('updated_at', StringType(), True), StructField('person_info', StructType([StructField('first_name', StringType(), True), StructField('last_name', StringType(), True), StructField('email', StringType(), True)]), True)]), False), False)

Describe the solution you would like Honor ignore_column_order for StructTypes that are array items.

quickdraw6906 avatar Mar 02 '25 03:03 quickdraw6906

I exploded the array and flattened the schema and the df's compare fine.

` merge_df = merge_df.withColumn("representatives_check", F.explode("representatives")).drop("representatives") select_exprs, nesting_info = flatten_schema(merge_df.schema, dot_prefix="", flat_prefix="") merge_df = merge_df.select(*select_exprs)

expected_df = expected_df.withColumn("representatives_check", F.explode("representatives")).drop("representatives") select_exprs, nesting_info = flatten_schema(expected_df.schema, dot_prefix="", flat_prefix="") expected_df = expected_df.select(*select_exprs)

chispa.assert_df_equality(merge_df, expected_df, ignore_row_order=True, ignore_column_order=True) `

quickdraw6906 avatar Mar 02 '25 03:03 quickdraw6906