datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Fuse operations in `equal_rows_arr`

Open Dandandan opened this issue 1 year ago • 1 comments

Is your feature request related to a problem or challenge?

equal_rows_arr compares pairs of 2 arrays with indices for equality but shows up in profiles.

Currently this is done in the following way

  • take the values for the indices for the first pair
  • comparing the arrays using eq or not_distinct
  • doing the same for the next pairs and anding the results
  • Filtering the indices based on the resulting boolean array

Describe the solution you'd like

We could optimize this in some ways:

  • writing a kernel that doesn't use take (i.e. copy the array) but compares arrays based on the indices.
  • writing results to a single booleanbuffer rather than creating a new one every time
  • removing indices from the list (e.g. using Vec::retain) not matching rather than creating a boolean array for a filter

Describe alternatives you've considered

No response

Additional context

No response

Dandandan avatar Aug 23 '24 09:08 Dandandan

take

Rachelint avatar Aug 26 '24 10:08 Rachelint

Unassign this due to a bit busy currently. @LeslieKid Maybe you will be interested on this.

Rachelint avatar Nov 11 '24 13:11 Rachelint

take

LeslieKid avatar Nov 11 '24 13:11 LeslieKid