awkward icon indicating copy to clipboard operation
awkward copied to clipboard

CUDA kernels that are implemented but not optimal

Open jpivarski opened this issue 7 months ago • 0 comments

This is primarily for record-keeping, so that we don't forget about CUDA kernels that should be revisited someday. To be in this list, a kernel must be implemented correctly (in main or an impending PR), but have some reason to be rewritten. The list is to help us stick to the policy that existence is the first priority and optimization is second, without the temptation to go down a rabbit-hole of optimizing every kernel before moving on to the next one.

Variable-length inner loop:

  • [ ] awkward_IndexedArray_ranges_next_64
  • [ ] awkward_IndexedArray_ranges_carry_next_64
  • [ ] awkward_ListArray_getitem_jagged_numvalid
  • [ ] awkward_ListArray_getitem_next_range_spreadadvanced
  • [ ] awkward_ListArray_broadcast_tooffsets
  • [ ] awkward_ListArray_localindex
  • [ ] awkward_ListOffsetArray_drop_none_indexes
  • [ ] awkward_ListOffsetArray_reduce_local_nextparents_64
  • [ ] awkward_ListArray_rpad_axis1
  • [ ] awkward_ListOffsetArray_rpad_axis1
  • [ ] awkward_ListArray_combinations_length
  • [ ] awkward_NumpyArray_pad_zero_to_length
  • [ ] awkward_NumpyArray_rearrange_shifted
  • [ ] awkward_UnionArray_flatten_combine
  • [ ] awkward_UnionArray_nestedfill_tags_index

jpivarski avatar Jan 25 '24 14:01 jpivarski