Add option to ingore order of array items (but check schema of array items)
If structure of array items differs only in the ordering of columns chispa still throws SchemasNotEqualError when ignore_column_order is passed
chispa.assert_df_equality(merge_df, expected_df, ignore_row_order=True, ignore_column_order=True)
chispa.schema_comparer.SchemasNotEqualError:
| StructField('representatives', ArrayType(StructType([StructField('updated_at', StringType(), True), StructField('id', StringType(), True), StructField('aa_id', StringType(), True), StructField('rep_number', DoubleType(), True), StructField('person_info', StructType([StructField('first_name', StringType(), True), StructField('last_name', StringType(), True), StructField('email', StringType(), True)]), True)]), False), False) | StructField('representatives', ArrayType(StructType([StructField('id', StringType(), True), StructField('aa_id', StringType(), True), StructField('rep_number', DoubleType(), True), StructField('updated_at', StringType(), True), StructField('person_info', StructType([StructField('first_name', StringType(), True), StructField('last_name', StringType(), True), StructField('email', StringType(), True)]), True)]), False), False)
Describe the solution you would like Honor ignore_column_order for StructTypes that are array items.
I exploded the array and flattened the schema and the df's compare fine.
` merge_df = merge_df.withColumn("representatives_check", F.explode("representatives")).drop("representatives") select_exprs, nesting_info = flatten_schema(merge_df.schema, dot_prefix="", flat_prefix="") merge_df = merge_df.select(*select_exprs)
expected_df = expected_df.withColumn("representatives_check", F.explode("representatives")).drop("representatives") select_exprs, nesting_info = flatten_schema(expected_df.schema, dot_prefix="", flat_prefix="") expected_df = expected_df.select(*select_exprs)
chispa.assert_df_equality(merge_df, expected_df, ignore_row_order=True, ignore_column_order=True) `