datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Support comparison operators on nested data types (Struct, List, ..)

Open Blizzara opened this issue 8 months ago • 2 comments

Is your feature request related to a problem or challenge?

We're working on running some used-to-be-Spark pipelines through DataFusion. One case we've noticed where DataFusion doesn't support something is comparing lists. (Spark allows)[https://github.com/apache/spark/blame/d9394eee5ebbeb695baaec6122da2ed970842dfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L1025] comparing (==, !=, <, >, <=, >=, ..) columns of structs and lists, while in DataFusion those seem to throw:

For structs, from our internal testing:

ArrowError(InvalidArgumentError("Invalid comparison operation: Struct([Field { name: \"a\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"b\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }]) <= Struct([Field { name: \"a\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"b\", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }])"), None)

For lists, this is shown in DataFusion's tests: https://github.com/apache/datafusion/blob/3773fb7fb54419f889e7d18b73e9eb48069eb08e/datafusion/sqllogictest/test_files/array_query.slt#L44

Maybe this would need to be improved on Arrow directly, seeing that the error is coming from https://github.com/apache/arrow-rs/blob/087f34b70e97ee85e1a54b3c45c5ed814f500b0a/arrow-ord/src/cmp.rs#L219?

Describe the solution you'd like

Binary predicates to be allowed for structs and lists, preferably following same semantics as in Spark (mostly I think it's a DFS over all the fields https://github.com/apache/spark/blob/d9394eee5ebbeb695baaec6122da2ed970842dfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala#L285)

Describe alternatives you've considered

No response

Additional context

Related to https://github.com/apache/datafusion/issues/2326

Blizzara avatar Jun 10 '24 15:06 Blizzara