awkward icon indicating copy to clipboard operation
awkward copied to clipboard

Remove unused kernels when v2 transition is complete

Open jpivarski opened this issue 3 years ago • 1 comments

Here's a quick way to find out what kernels are actually used:

fgrep -r '"awkward_' src/awkward/_v2 | sed 's/.*"\(awkward_[^"]*\)".*/\1/' | sort | uniq

although a smarter one could make a Python AST of all the src/awkward/_v2 code and searching for nplike[...] (subscript), where the first tuple item is a constant string naming the kernel.

As of this writing, the list of used kernels (using the quick method) is

awkward_argsort
awkward_BitMaskedArray_to_ByteMaskedArray
awkward_BitMaskedArray_to_IndexedOptionArray
awkward_ByteMaskedArray_getitem_nextcarry
awkward_ByteMaskedArray_getitem_nextcarry_outindex
awkward_ByteMaskedArray_mask
awkward_ByteMaskedArray_numnull
awkward_ByteMaskedArray_overlay_mask
awkward_ByteMaskedArray_reduce_next_64
awkward_ByteMaskedArray_reduce_next_nonlocal_nextshifts_64
awkward_ByteMaskedArray_reduce_next_nonlocal_nextshifts_fromshifts_64
awkward_ByteMaskedArray_toIndexedOptionArray
awkward_Content_getitem_next_missing_jagged_getmaskstartstop
awkward_IndexedArray_fill
awkward_IndexedArray_fill_count
awkward_IndexedArray_fill_to64_count
awkward_IndexedArray_flatten_nextcarry
awkward_IndexedArray_getitem_nextcarry
awkward_IndexedArray_getitem_nextcarry_outindex
awkward_IndexedArray_index_of_nulls
awkward_IndexedArray_local_preparenext_64
awkward_IndexedArray_mask
awkward_IndexedArray_numnull
awkward_IndexedArray_numnull_parents
awkward_IndexedArray_numnull_unique_64
awkward_IndexedArray_overlay_mask
awkward_IndexedArray_ranges_carry_next_64
awkward_IndexedArray_ranges_next_64
awkward_IndexedArray_reduce_next_64
awkward_IndexedArray_reduce_next_fix_offsets_64
awkward_IndexedArray_reduce_next_nonlocal_nextshifts_64
awkward_IndexedArray_reduce_next_nonlocal_nextshifts_fromshifts_64
awkward_IndexedArray_simplify
awkward_IndexedArray_unique_next_index_and_offsets_64
awkward_IndexedArray_validity
awkward_IndexedOptionArray_rpad_and_clip_mask_axis1
awkward_Index_iscontiguous
awkward_Index_nones_as_index
awkward_index_rpad_and_clip_axis0
awkward_index_rpad_and_clip_axis1
awkward_ListArray_broadcast_tooffsets
awkward_ListArray_combinations
awkward_ListArray_combinations_length
awkward_ListArray_compact_offsets
awkward_ListArray_fill
awkward_ListArray_getitem_jagged_apply
awkward_ListArray_getitem_jagged_carrylen
awkward_ListArray_getitem_jagged_descend
awkward_ListArray_getitem_jagged_expand
awkward_ListArray_getitem_jagged_numvalid
awkward_ListArray_getitem_jagged_shrink
awkward_ListArray_getitem_next_array
awkward_ListArray_getitem_next_array_advanced
awkward_ListArray_getitem_next_at
awkward_ListArray_getitem_next_range
awkward_ListArray_getitem_next_range_carrylength
awkward_ListArray_getitem_next_range_counts
awkward_ListArray_getitem_next_range_spreadadvanced
awkward_ListArray_localindex
awkward_ListArray_min_range
awkward_ListArray_num
awkward_ListArray_rpad_and_clip_length_axis1
awkward_ListArray_rpad_axis1
awkward_ListArray_validity
awkward_ListOffsetArray_argsort_strings
awkward_ListOffsetArray_compact_offsets
awkward_ListOffsetArray_local_preparenext_64
awkward_ListOffsetArray_reduce_local_nextparents_64
awkward_ListOffsetArray_reduce_local_outoffsets_64
awkward_ListOffsetArray_reduce_nonlocal_findgaps_64
awkward_ListOffsetArray_reduce_nonlocal_maxcount_offsetscopy_64
awkward_ListOffsetArray_reduce_nonlocal_nextshifts_64
awkward_ListOffsetArray_reduce_nonlocal_nextstarts_64
awkward_ListOffsetArray_reduce_nonlocal_outstartsstops_64
awkward_ListOffsetArray_reduce_nonlocal_preparenext_64
awkward_ListOffsetArray_rpad_and_clip_axis1
awkward_ListOffsetArray_rpad_axis1
awkward_ListOffsetArray_rpad_length_axis1
awkward_ListOffsetArray_toRegularArray
awkward_localindex
awkward_MaskedArray_getitem_next_jagged_project
awkward_missing_repeat
awkward_NumpyArray_fill
awkward_NumpyArray_rearrange_shifted
awkward_NumpyArray_reduce_adjust_starts_64
awkward_NumpyArray_reduce_adjust_starts_shifts_64
awkward_NumpyArray_reduce_mask_ByteMaskedArray_64
awkward_NumpyArray_sort_asstrings_uint8
awkward_NumpyArray_subrange_equal
awkward_NumpyArray_unique_strings
awkward_quick_sort
awkward_reduce_argmax
awkward_reduce_argmin
awkward_reduce_count_64
awkward_reduce_countnonzero
awkward_reduce_max
awkward_reduce_min
awkward_reduce_prod
awkward_reduce_prod_bool
awkward_reduce_sum
awkward_reduce_sum_bool
awkward_reduce_sum_int32_bool_64
awkward_reduce_sum_int64_bool_64
awkward_RegularArray_broadcast_tooffsets
awkward_RegularArray_broadcast_tooffsets_size1
awkward_RegularArray_combinations_64
awkward_RegularArray_compact_offsets
awkward_RegularArray_getitem_carry
awkward_RegularArray_getitem_jagged_expand
awkward_RegularArray_getitem_next_array
awkward_RegularArray_getitem_next_array_advanced
awkward_RegularArray_getitem_next_array_regularize
awkward_RegularArray_getitem_next_at
awkward_RegularArray_getitem_next_range
awkward_RegularArray_getitem_next_range_spreadadvanced
awkward_RegularArray_localindex
awkward_RegularArray_num
awkward_RegularArray_rpad_and_clip_axis1
awkward_sort
awkward_sorting_ranges
awkward_sorting_ranges_length
awkward_UnionArray_fillindex
awkward_UnionArray_fillindex_count
awkward_UnionArray_filltags
awkward_UnionArray_filltags_const
awkward_UnionArray_project
awkward_UnionArray_regular_index
awkward_UnionArray_regular_index_getsize
awkward_UnionArray_simplify
awkward_UnionArray_simplify_one
awkward_UnionArray_validity
awkward_unique
awkward_unique_copy
awkward_unique_offsets
awkward_unique_ranges

This is 135 kernels used by v2 out of 206 kernels currently defined. Using cloc to count lines of code:

-------------------------------------------------------------------------------
                             files          blank        comment           code
-------------------------------------------------------------------------------
Currently existing kernels     206            798            218          14178
-------------------------------------------------------------------------------
Kernels used by v2             131            495            140           9237
-------------------------------------------------------------------------------

That is, we can drop 35% of the lines of code.

Even this list of 135 kernels may be inflated: some in the list can possibly be replaced by NumPy and it hasn't been investigated because the old code was translated as-is. A manual scan over the 135 would reveal that, but we should wait on that scan until v2 is done.

jpivarski avatar Nov 17 '21 16:11 jpivarski

@jpivarski - and kernel-specification.yml will shrink as well.

ianna avatar Nov 17 '21 16:11 ianna

@jpivarski

I am closing this issue as no action is to be taken. All unused kernels have been removed prior to my investigation of this issue.

This can be checked in the following manner:


fgrep -R --exclude-dir=cuda '"awkward_' src/awkward/  | sed 's/.*"\(awkward_[^"]*\)".*/\1/' | sort | uniq > used_kernels.txt

python 

import subprocess

with open("used_kernels.txt") as f:
       used_kernels = f.readlines()

used_kernels = [x[:-1] for x in used_kernels]
used_kernels.sort()

existing_kernels = subprocess.run(['ls', 'src/cpu-kernels/'], stdout=subprocess.PIPE).stdout.decode('utf-8').split('\n')

existing_kernels = [x[:-4] for x in existing_kernels[:-1]]

existing_kernels.sort()

set(existing_kernels) - set(used_kernels)

ioanaif avatar Nov 14 '22 08:11 ioanaif

Great, thanks for checking!

jpivarski avatar Nov 14 '22 15:11 jpivarski