datafusion
datafusion copied to clipboard
Fuse operations in `equal_rows_arr`
Is your feature request related to a problem or challenge?
equal_rows_arr compares pairs of 2 arrays with indices for equality but shows up in profiles.
Currently this is done in the following way
-
takethe values for the indices for the first pair - comparing the arrays using
eqornot_distinct - doing the same for the next pairs and
anding the results - Filtering the indices based on the resulting boolean array
Describe the solution you'd like
We could optimize this in some ways:
- writing a kernel that doesn't use
take(i.e. copy the array) but compares arrays based on the indices. - writing results to a single booleanbuffer rather than creating a new one every time
- removing indices from the list (e.g. using
Vec::retain) not matching rather than creating a boolean array for a filter
Describe alternatives you've considered
No response
Additional context
No response
take
Unassign this due to a bit busy currently. @LeslieKid Maybe you will be interested on this.
take