Improve coverage of tests for array expressions
What is the problem the feature request solves?
I would like to start implementing some shared test code for array expressions to make it easier to test for edge cases such as null or empty arrays.
The following functions need improved tests:
- [ ] array_remove
- [ ] array_contains
- [ ] array_append
- [ ] array_intersect
- [ ] array_join
- [ ] array_repeat
- [ ] array_compact
- [ ] arrays_overlap
Describe the potential solution
No response
Additional context
No response
We should also start testing with all supported data types for the array expressions.
Here is an example of an issue that we should have been able to catch using fuzz testing:
https://github.com/apache/datafusion-comet/issues/1307
Would it help to extend makeParquetFileAllTypes or create a new method which creates a parquet file that contains more complex structures like lists (with all datatypes and structs because the current tests usually test the implementation with arrays only containing 1-3 elements) and structs to improve the tests with additional test cases?
However I think that requires complex type support for the comet native parquet reader.
Would it help to extend makeParquetFileAllTypes or create a new method which creates a parquet file that contains more complex structures like lists (with all datatypes and structs because the current tests usually test the implementation with arrays only containing 1-3 elements) and structs to improve the tests with additional test cases?
However I think that requires complex type support for the comet native parquet reader.
Yes, exactly. I have started down this path in https://github.com/apache/datafusion-comet/pull/1308
As you said, until we have complex type support we cannot generate Parquet files containing complex types. For now I had to work around this by generating a Parquet file with primitive types and then use the array expression in SQL to create arrays to pass into array_remove.
Complex type support for Parquet is coming very soon though.
I did try disabling native Parquet scans and use the COMET_CONVERT_FROM_PARQUET_ENABLED feature, but this failed with an error that I need to investigate.