Add option to ingore order of array items (but check schema of array items)

Open quickdraw6906 opened this issue 10 months ago • 1 comments

If structure of array items differs only in the ordering of columns chispa still throws SchemasNotEqualError when ignore_column_order is passed

chispa.assert_df_equality(merge_df, expected_df, ignore_row_order=True, ignore_column_order=True)

chispa.schema_comparer.SchemasNotEqualError:

| StructField('representatives', ArrayType(StructType([StructField('updated_at', StringType(), True), StructField('id', StringType(), True), StructField('aa_id', StringType(), True), StructField('rep_number', DoubleType(), True), StructField('person_info', StructType([StructField('first_name', StringType(), True), StructField('last_name', StringType(), True), StructField('email', StringType(), True)]), True)]), False), False) | StructField('representatives', ArrayType(StructType([StructField('id', StringType(), True), StructField('aa_id', StringType(), True), StructField('rep_number', DoubleType(), True), StructField('updated_at', StringType(), True), StructField('person_info', StructType([StructField('first_name', StringType(), True), StructField('last_name', StringType(), True), StructField('email', StringType(), True)]), True)]), False), False)

Describe the solution you would like Honor ignore_column_order for StructTypes that are array items.

Mar 02 '25 03:03 quickdraw6906

I exploded the array and flattened the schema and the df's compare fine.

` merge_df = merge_df.withColumn("representatives_check", F.explode("representatives")).drop("representatives") select_exprs, nesting_info = flatten_schema(merge_df.schema, dot_prefix="", flat_prefix="") merge_df = merge_df.select(*select_exprs)

expected_df = expected_df.withColumn("representatives_check", F.explode("representatives")).drop("representatives") select_exprs, nesting_info = flatten_schema(expected_df.schema, dot_prefix="", flat_prefix="") expected_df = expected_df.select(*select_exprs)

chispa.assert_df_equality(merge_df, expected_df, ignore_row_order=True, ignore_column_order=True) `

Mar 02 '25 03:03 quickdraw6906