awkward icon indicating copy to clipboard operation
awkward copied to clipboard

Mysterious error in ak.drop_none

Open jpivarski opened this issue 7 months ago • 1 comments

Version of Awkward Array

HEAD

Description and code to reproduce

This might be an unusual case, but it shouldn't raise this error. I haven't looked into it; I'm just logging it for future research.

array = ak.Array(
    ak.contents.ListArray(
        ak.index.Index64(np.array([0, 4, 8])),
        ak.index.Index64(np.array([3, 5, 12])),
        ak.contents.ByteMaskedArray(
            ak.index.Index8(np.array([0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0])),
            ak.contents.NumpyArray(
                np.array([1, 1, 0, -1, 1, -1, -1, -1, 0, 0, 0, 0])
            ),
            valid_when=False,
        ),
    ),
    check_valid=True,
)
>>> array
<Array [[1, 1, 0], [1], [0, 0, 0, 0]] type='3 * var * ?int64'>

>>> ak.drop_none(array)
<Array [[1, 1, 0], [1], [0, 0, 0, 0]] type='3 * var * int64'>

>>> ak.drop_none(array, axis=-1)
Traceback (most recent call last):
  File "/home/jpivarski/irishep/awkward/src/awkward/_dispatch.py", line 62, in dispatch
    next(gen_or_result)
  File "/home/jpivarski/irishep/awkward/src/awkward/operations/ak_drop_none.py", line 56, in drop_none
    return _impl(array, axis, highlevel, behavior, attrs)
  File "/home/jpivarski/irishep/awkward/src/awkward/operations/ak_drop_none.py", line 121, in _impl
    out = ak._do.recursively_apply(out, recompute_offsets, depth_context=options)
  File "/home/jpivarski/irishep/awkward/src/awkward/_do.py", line 36, in recursively_apply
    return layout._recursively_apply(
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/listarray.py", line 1588, in _recursively_apply
    result = action(
  File "/home/jpivarski/irishep/awkward/src/awkward/operations/ak_drop_none.py", line 94, in recompute_offsets
    out = layout._rebuild_without_nones(none_indexes, layout.content)
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/listarray.py", line 1530, in _rebuild_without_nones
    return self.to_ListOffsetArray64()._rebuild_without_nones(
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/listarray.py", line 302, in to_ListOffsetArray64
    return self._broadcast_tooffsets64(offsets)
  File "/home/jpivarski/irishep/awkward/src/awkward/contents/listarray.py", line 429, in _broadcast_tooffsets64
    self._backend.maybe_kernel_error(
  File "/home/jpivarski/irishep/awkward/src/awkward/_backends/backend.py", line 67, in maybe_kernel_error
    raise ValueError(self.format_kernel_error(error))
ValueError: stops[i] > len(content) while attempting to get index 12 (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-26/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L20)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/irishep/awkward/src/awkward/_dispatch.py", line 38, in dispatch
    with OperationErrorContext(name, args, kwargs):
  File "/home/jpivarski/irishep/awkward/src/awkward/_errors.py", line 85, in __exit__
    self.handle_exception(exception_type, exception_value)
  File "/home/jpivarski/irishep/awkward/src/awkward/_errors.py", line 95, in handle_exception
    raise self.decorate_exception(cls, exception)
ValueError: stops[i] > len(content) while attempting to get index 12 (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-26/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L20)

This error occurred while calling

    ak.drop_none(
        <Array [[1, 1, 0], [1], [0, 0, 0, 0]] type='3 * var * ?int64'>
        axis = -1
    )

jpivarski avatar Jan 02 '24 21:01 jpivarski

I've started working on this, but it's tricky to reason about the various ways this function can act. This is not a localised problem; reasoning about axes is always tricky. I'm just dropping my working thoughts for now:

As I currently understand it, axis=X is orthogonal to record structure, so we should not treat var * [x * ..., y * ...] differently to var * x * .... Yet, we do not permit axis=0 for record arrays because it may shift the field structure if x and y do not have missing values in the same place.

We need to act in two places: at axis == depth to fix lists, and axis == depth - 1 to drop the missing values. The following content types need to be considered in the latter case:

option-like = option | record[option-like] | union[option-like]

Right now we don't consider all of these possible branches.

agoose77 avatar Jan 05 '24 14:01 agoose77