awkward icon indicating copy to clipboard operation
awkward copied to clipboard

`ak.where` produces different output types and shapes in awkward2 versus awkward1 since PR 3168

Open ikrommyd opened this issue 6 months ago • 17 comments

Version of Awkward Array

2.8.4

Description and code to reproduce

The following errors in awkward2:

In [1]: import awkward as ak

In [2]: import pathlib

In [3]: condition = ak.from_json(pathlib.Path("condition_v2.json"))

In [4]: array = ak.from_json(pathlib.Path("DPhiHJ0J1_v2.json"))

In [5]: ak.where(condition, -999, array)

with

ValueError: cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)

This error occurred while calling

    ak.where(
        <Array [None, None, ..., [False], [False]] type='9610 * option[var ...'>
        -999
        <Array [None, None, ..., [2.74], [3.06]] type='9610 * option[var * ...'>
    )

while it's fine in awkward1

In [1]: import pathlib

In [2]: import awkward as ak

In [3]: array = ak.from_json(pathlib.Path("DPhiHJ0J1_v2.json"))

In [4]: array
Out[4]: <Array [None, None, None, [3.12, ... [3.06]] type='9610 * option[var * float64]'>

In [5]: condition = ak.from_json(pathlib.Path("condition_v2.json"))

In [6]: ak.where(condition, -999, array)
Out[6]: <Array [None, None, None, [3.12, ... [3.06]] type='9610 * option[var * float64]'>

In [7]: ak.__version__
Out[7]: '1.10.5'

The arrays in json format are here: DPhiHJ0J1_v2.json condition_v2.json

The weird thing in this example is that the mask (condition) is itself an option type. Maybe that shouldn't even exist and a more graceful error should be raised. An option type mask feels a bit weird in my head. I don't know what the history of awkward says for such cases.

Edit: It was found out that the original arrays in json of this issue had broadcasting problems originating from user code and the issue regarding that is unrelated to awkward. However, it was found that there are inconsistencies in the shape and type of ak.wheres output since https://github.com/scikit-hep/awkward/pull/3168. For example before that PR (and awkward1):

In [1]: import awkward as ak

In [2]: x = ak.Array([None, [True], None, [False]])

In [3]: y = ak.Array([None, [1], None, [2]])

In [4]: z = ak.Array([10, 11, 12, 13])

In [5]: ak.where(x, y, z)
Out[5]: <Array [None, [1], None, [13]] type='4 * option[var * int64]'> 

With that PR:

In [5]: ak.where(x, y, z)
Out[5]: <Array [[None], [1], [None], [13]] type='4 * var * ?int64'>

I find the former output type and shape a lot more intuitive. We should probably try to get the best of both worlds. That PR fixed some problems with ak.where and that should stay as is but also introduced some output type and shape changes that are probably not the best. See the long discussion below for more info.

ikrommyd avatar Jun 27 '25 17:06 ikrommyd

@pfackeldey I'm opening this for reference even though I told you the bug already.

ikrommyd avatar Jun 27 '25 17:06 ikrommyd

@ikrommyd - thank you for reporting it!!

ianna avatar Jun 28 '25 06:06 ianna

Hmm these arrays are weird. This is probably an awkward1 bug actually. These do not match:

In [48]: ak.drop_none(ak.num(array))
Out[48]: <Array [1, 0, 0, 1, 0, 0, 0, 1, ..., 1, 1, 0, 1, 1, 1, 1, 1] type='891 * int64'>

In [49]: ak.drop_none(ak.num(condition))
Out[49]: <Array [1, 0, 0, 1, 0, 0, 0, 1, ..., 1, 1, 0, 1, 1, 1, 1, 1] type='910 * int64'>

and also

In [54]: ak.all(ak.fill_none(ak.num(array), -999) == ak.fill_none(ak.num(condition), -999))
Out[54]: np.False_

Where seems fine in small examples like:

In [56]: ak.where([None, True, None, False], -999, [None, 1, None, 2])
Out[56]: <Array [None, -999, None, 2] type='4 * ?int64'>

and

In [57]: ak.where([None, [True], None, [False]], -999, [None, [1], None, [2]])
Out[57]: <Array [[None], [-999], [None], [2]] type='4 * var * ?int64'>

It also seems fine with subsets of the arrays like

In [59]: condition[-10:]
Out[59]: <Array [None, None, None, ..., [False], [False]] type='10 * option[var * bool]'>

In [60]: array[-10:]
Out[60]: <Array [None, None, None, ..., [2.74], [3.06]] type='10 * option[var * floa...'>

In [61]: ak.where(condition[-10:], -999, array[-10:])
Out[61]: <Array [[None], [None], [None], ..., [2.74], [3.06]] type='10 * var * ?float64'>

and

In [63]: condition[:10]
Out[63]: <Array [None, None, None, ..., None, None, None] type='10 * option[var * bool]'>

In [64]: array[:10]
Out[64]: <Array [None, None, None, ..., None, None] type='10 * option[var * float64]'>

In [65]: ak.where(condition[:10], -999, array[:10])
Out[65]: <Array [[None], [None], [None], ..., [None], [None]] type='10 * var * ?float64'>

It also doesn't error if you manually broadcast first and pass the broadcasted arrays to where.

In [68]: ak.broadcast_arrays(array, condition)
Out[68]:
[<Array [None, None, None, ..., [2.74], [3.06]] type='9610 * option[var * fl...'>,
 <Array [None, None, ..., [False], [False]] type='9610 * option[var * bool]'>]

In [69]: ak.where(ak.broadcast_arrays(array, condition)[1], -999, ak.broadcast_arrays(array, condition)[0])
Out[69]: <Array [[None], [None], ..., [2.74], [3.06]] type='9610 * var * ?float64'>

Notice however that there is a difference in behavior. Awkward1 doesn't error and returns this:

In [6]: ak.where(condition, -999, array)
Out[6]: <Array [None, None, None, [3.12, ... [3.06]] type='9610 * option[var * float64]'>

So the Nones are not wrapped in a jagged dimension while Awkward2 does wrap Nones in a jagged dimension and gives you [None] in the final output array in the cases where it works (all examples above). Awkward1 seems like it ignores the Nones and just leaves them as is and only does perform the where operations on everything else.

This may be just an awkward1 problem (which we won't fix) and a question whether a better error or some better handling of such arrays can be made.

@pfackeldey You did any more investigation on what's happening here while I was away?

ikrommyd avatar Jun 28 '25 11:06 ikrommyd

I found out that the shapes don't match as well. Also we do support ak.where with Nones, see the tests: https://github.com/scikit-hep/awkward/blob/main/tests/test_3098_ak_where_with_arrays_containing_optionals.py

I didn't investigate further, because - as you pointed out - this looks like no problem in awkward, at least to my understanding.

the last part (that it works when using broadcasting) is interesting again. I'm not sure if I fully understand what is happening there. We do have a special code path for broadcasting options in ak.where: https://github.com/scikit-hep/awkward/blob/main/src/awkward/_broadcasting.py#L861

pfackeldey avatar Jun 28 '25 12:06 pfackeldey

Well there is still some inconsistency right? In the cases where it works for example here:

In [59]: condition[-10:]
Out[59]: <Array [None, None, None, ..., [False], [False]] type='`'>

In [60]: array[-10:]
Out[60]: <Array [None, None, None, ..., [2.74], [3.06]] type='10 * option[var * floa...'>

In [61]: ak.where(condition[-10:], -999, array[-10:])
Out[61]: <Array [[None], [None], [None], ..., [2.74], [3.06]] type='10 * var * ?float64'>

The condition is N * option[var * bool] and the array is N * option[var * float64]. ak.where seems to broadcast that to N * var * ?float64. So the Nones are wrapped in a jagged dimension while awkward1 maintains the N * option[var * float64] type. See:

In [6]: ak.where(condition, -999, array)
Out[6]: <Array [None, None, None, [3.12, ... [3.06]] type='9610 * option[var * float64]'>

In [7]: ak.__version__
Out[7]: '1.10.5'

You can simply see this in the following simple example

In [1]: import awkward as ak

In [2]: ak.__version__
Out[2]: '1.10.5'

In [3]: ak.where([None, [True], None, [False]], -999, [None, [1], None, [2]])
Out[3]: <Array [None, [-999], None, [2]] type='4 * option[var * int64]'>

versus

In [1]: import awkward as ak

In [2]: ak.__version__
Out[2]: '2.8.4'

In [3]: ak.where([None, [True], None, [False]], -999, [None, [1], None, [2]])
Out[3]: <Array [[None], [-999], [None], [2]] type='4 * var * ?int64'>

So something is definitely not consistent in how ak.where applies broadcasting.

ikrommyd avatar Jun 28 '25 14:06 ikrommyd

See here for example how ak.where changes the type while addition doesn't:

In [21]: x = ak.Array([None, [True], None, [False]])

In [22]: y = ak.Array([None, [1], None, [2]])

In [23]: z = ak.Array([10, 11, 12, 13])

In [24]: ak.where(x, y, z)
Out[24]: <Array [[None], [1], [None], [13]] type='4 * var * ?int64'>

In [25]: x + y
Out[25]: <Array [None, [2], None, [2]] type='4 * option[var * int64]'>

In [26]: x + z
Out[26]: <Array [None, [12], None, [13]] type='4 * option[var * int64]'>

In [27]: y + z
Out[27]: <Array [None, [12], None, [15]] type='4 * option[var * int64]'>

Plain broadcasting doesn't change the type like that:

In [10]: ak.broadcast_arrays(x, y, -999)
Out[10]:
[<Array [None, [True], None, [False]] type='4 * option[var * bool]'>,
 <Array [None, [1], None, [2]] type='4 * option[var * int64]'>,
 <Array [None, [-999], None, [-999]] type='4 * option[var * int64]'>]

awkward1 on the other hand gives

In [8]: ak.where(x, y, z)
Out[8]: <Array [None, [1], None, [13]] type='4 * option[var * int64]'>

ikrommyd avatar Jun 28 '25 14:06 ikrommyd

@ikrommyd - if we use awkward 2 - is it wrong? We will not be fixing awkward 1. I don’t think we should use it as a reference. Please, check if ak.where gives you correct results with awkward2. Thanks!

ianna avatar Jun 28 '25 15:06 ianna

I think the original failure with ak.where is because the condition and the array have different raggedness, which makes it correct that it doesn't work in awkward 2 I'd say. I rather consider awkward 1 wrong in that case.

To the second (new) part: that the output dimensions with broadcastable inputs to ak.where are different is something to look into; it's however a completely different issue than where we began with in this issue thread.

pfackeldey avatar Jun 28 '25 15:06 pfackeldey

Well the original issue is that the ak.where behavior differs and it indeed does. ak.where changes the type in this case. I find this odd but I can't say with absolute certainty what the right behavior is. In general I don't think that ak.where should change the type like that

ikrommyd avatar Jun 28 '25 15:06 ikrommyd

@ikrommyd - if we use awkward 2 - is it wrong? We will not be fixing awkward 1. I don’t think we should use it as a reference. Please, check if ak.where gives you correct results with awkward2. Thanks!

Well the original arrays seem to be a bit wrong so that points to a problem in user code and I don't think that ak.where gives a strictly "wrong" result. However that doesn't change the fact that ak.where behavior seems a bit off with its type-changing thingy.

ikrommyd avatar Jun 28 '25 15:06 ikrommyd

@pfackeldey there you go. Performed ak.where in slices of length 10 to find the problem.

In [9]: length = len(condition)
   ...: problematic = []
   ...:
   ...: for start in range(0, length, 10):
   ...:     end = min(start + 10, length)
   ...:     cond_chunk = condition[start:end]
   ...:     arr_chunk  = array   [start:end]
   ...:     try:
   ...:         _ = ak.where(cond_chunk, -999, arr_chunk)
   ...:     except Exception as e:
   ...:         print(f"❌ Error in chunk {start}:{end} \n {e!r}")
   ...:         print("   condition slice:", cond_chunk)
   ...:         print("   array slice:    ", arr_chunk)
   ...:         problematic.append((start, end, e))
   ...:
   ...: if not problematic:
   ...:     print("✅ No errors found in any 10-element chunk.")
   ...: else:
   ...:     print(f"\nFound {len(problematic)} problematic chunk(s):")
   ...:     for start, end, err in problematic:
   ...:         print(f" • {start}:{end} → {err!r}")
   ...:
❌ Error in chunk 1340:1350
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, None, None, None, None, None, [False], [], None, None]
   array slice:     [None, None, None, None, None, None, None, None, None, None]
❌ Error in chunk 4390:4400
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [[], None, None, None, None, None, None, None, None, []]
   array slice:     [[], None, None, None, None, None, None, None, None, None]
❌ Error in chunk 7760:7770
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, None, None, None, [], None, None, None, None, None]
   array slice:     [None, None, None, None, None, None, None, None, None, None]
❌ Error in chunk 9420:9430
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, None, None, None, None, None, None, None, [], None]
   array slice:     [None, None, None, None, None, None, None, None, None, None]
❌ Error in chunk 9430:9440
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, [False], [], None, None, None, None, None, None, None]
   array slice:     [None, [2.94], None, None, None, None, None, None, None, None]

Found 5 problematic chunk(s):
 • 1340:1350 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 4390:4400 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 7760:7770 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 9420:9430 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 9430:9440 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')

ikrommyd avatar Jun 28 '25 15:06 ikrommyd

Actually, using ak.where(ak.to_packed(cond_chunk), -999, ak.to_packed(arr_chunk)) is the correct thing to do when slicing which prints out

❌ Error in chunk 320:330
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [[], None, None, None, None, [False], None, [], None, None]
   array slice:     [[], None, None, None, None, None, None, [], None, None]
❌ Error in chunk 780:790
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, None, [], [False], None, None, None, None, None, None]
   array slice:     [None, None, [], None, None, None, None, None, None, None]
❌ Error in chunk 1340:1350
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, None, None, None, None, None, [False], [], None, None]
   array slice:     [None, None, None, None, None, None, None, None, None, None]
❌ Error in chunk 2320:2330
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, [False], None, None, None, None, None, None, None, None]
   array slice:     [None, None, None, None, None, None, None, None, None, None]
❌ Error in chunk 2620:2630
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, None, None, None, [False], None, None, None, None, None]
   array slice:     [None, None, None, None, None, None, None, None, None, None]
❌ Error in chunk 2890:2900
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, None, None, None, None, None, None, None, None, [False]]
   array slice:     [None, None, None, None, None, None, None, None, None, None]
❌ Error in chunk 5100:5110
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, None, None, None, None, None, None, [False], None, None]
   array slice:     [None, None, None, None, None, None, None, None, None, None]
❌ Error in chunk 9030:9040
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, None, None, None, [False], None, None, None, None, None]
   array slice:     [None, None, None, None, None, None, None, None, None, None]
❌ Error in chunk 9430:9440
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, [False], [], None, None, None, None, None, None, None]
   array slice:     [None, [2.94], None, None, None, None, None, None, None, None]

Found 9 problematic chunk(s):
 • 320:330 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 780:790 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 1340:1350 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 2320:2330 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 2620:2630 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 2890:2900 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 5100:5110 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 9030:9040 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 9430:9440 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')

while ak.where(ak.to_list(cond_chunk), -999, ak.to_list(arr_chunk)) gives

❌ Error in chunk 320:330
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [[], None, None, None, None, [False], None, [], None, None]
   array slice:     [[], None, None, None, None, None, None, [], None, None]
❌ Error in chunk 780:790
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, None, [], [False], None, None, None, None, None, None]
   array slice:     [None, None, [], None, None, None, None, None, None, None]
❌ Error in chunk 9430:9440
 ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
   condition slice: [None, [False], [], None, None, None, None, None, None, None]
   array slice:     [None, [2.94], None, None, None, None, None, None, None, None]

Found 3 problematic chunk(s):
 • 320:330 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 780:790 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')
 • 9430:9440 → ValueError('cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)')

So there seem to be things like this coming from the user's code

ak.where([None, [False], []], -999, [None, [2.94], None])

or with two rows:

ak.where([[False], []], -999, [[2.94], None])

which will indeed give

ValueError: cannot broadcast nested list (in compiled code: https://github.com/scikit-hep/awkward/blob/awkward-cpp-46/awkward-cpp/src/cpu-kernels/awkward_ListArray_broadcast_tooffsets.cpp#L27)

This error occurred while calling

    ak.where(
        [None, [False], []]
        -999
        [None, [2.94], None]
    )

Awkward1 handled those like this

In [3]: ak.where([None, [False], []], -999, [None, [2.94], None])
Out[3]: <Array [None, [2.94], None] type='3 * option[var * float64]'>

In [4]: ak.where([[False], []], -999, [[2.94], None])
Out[4]: <Array [[2.94], None] type='2 * option[var * float64]'>

ikrommyd avatar Jun 28 '25 15:06 ikrommyd

If we look at the single element cases of the above

In [78]: ak.__version__
Out[78]: '2.8.4'

In [79]: ak.where([[]], -999, [None])
Out[79]: <Array [[]] type='1 * var * ?int64'>

In [80]: ak.where([[False]], -999, [[2.94]])
Out[80]: <Array [[2.94]] type='1 * var * float64'>
In [6]: ak.__version__
Out[6]: '1.10.5'

In [7]: ak.where([[]], -999, [None])
Out[7]: <Array [None] type='1 * option[var * int64]'>

In [8]: ak.where([[False]], -999, [[2.94]])
Out[8]: <Array [[2.94]] type='1 * var * float64'>

To me, what awkward2 does here is weird and wrong. It just passes the empty condition through? awkward1 seems to treat empty as false and gives [None] back.

In [7]: ak.where([[]], -999, [None])
Out[7]: <Array [None] type='1 * option[var * int64]'>

ikrommyd avatar Jun 28 '25 15:06 ikrommyd

Well the original arrays seem to be a bit wrong so that points to a problem in user code and I don't think that ak.where gives a strictly "wrong" result. However that doesn't change the fact that ak.where behavior seems a bit off with its type-changing thingy.

yes, that's what I meant. We should investigate the type change, but not the 'problem' (that's not actually a problem, it's a correct failure) of ak.where with this condition and array as inputs - they are not aligning from their shapes as you see in your chunked examples.

pfackeldey avatar Jun 28 '25 16:06 pfackeldey

So this is coming directly from the apply_step function call. apply_step returns this empty array back.

In [79]: ak.where([[]], -999, [None])
Out[79]: <Array [[]] type='1 * var * ?int64'>

Awkward1 appears to treat it as if it was this (empty jagged dimension = False) although this changes the type to 1 * ?int64 while awkward1 maintains 1 * option[var * int64]

In [4]: ak.where([False], -999, [None])
Out[4]: <Array [None] type='1 * ?int64'>

vs

In [7]: ak.where([[]], -999, [None])
Out[7]: <Array [None] type='1 * option[var * int64]'>

ikrommyd avatar Jun 28 '25 16:06 ikrommyd

Oh, it looks like ak.where has a specific broadcasting function for option types that only it uses: https://github.com/scikit-hep/awkward/blob/473029b8cc7345083c29186e0c3def874eb3a9f2/src/awkward/_broadcasting.py#L861 Changing ak.where to use the nominal option type broadcasting preserves the type in the first examples I did

diff --git a/src/awkward/_broadcasting.py b/src/awkward/_broadcasting.py
index 5f6903f7..e29a4140 100644
--- a/src/awkward/_broadcasting.py
+++ b/src/awkward/_broadcasting.py
@@ -1157,7 +1157,7 @@ def apply_step(
             if all(not x.is_option or isinstance(x, UnmaskedArray) for x in contents):
                 return broadcast_any_option_all_UnmaskedArray()
             elif options["function_name"] == "ak.where":
-                return broadcast_any_option_akwhere()
+                return broadcast_any_option()
             else:
                 return broadcast_any_option()
In [1]: import awkward as ak

In [2]: x = ak.Array([None, [True], None, [False]])

In [3]: y = ak.Array([None, [1], None, [2]])

In [4]: z = ak.Array([10, 11, 12, 13])

In [5]: ak.where(x, y, z)
Out[5]: <Array [None, [1], None, [13]] type='4 * option[var * int64]'>

In [6]: ak.where([[]], -999, [None])
Out[6]: <Array [None] type='1 * option[var * float64]'>

It also makes the errors go away here:

In [3]: ak.where([None, [False], []], -999, [None, [2.94], None])
Out[3]: <Array [None, [2.94], None] type='3 * option[var * float64]'>

In [4]: ak.where([[False], []], -999, [[2.94], None])
Out[4]: <Array [[2.94], None] type='2 * option[var * float64]'>

In [1]: import pathlib

In [2]: import awkward as ak

In [3]: condition = ak.from_json(pathlib.Path("condition_v2.json"))

In [4]: array = ak.from_json(pathlib.Path("DPhiHJ0J1_v2.json"))

In [5]: ak.where(condition, -999, array)
Out[5]: <Array [None, None, None, ..., [2.74], [3.06]] type='9610 * option[var * fl...'>

This change however gives 3 test failures

FAILED tests/test_3098_ak_where_with_arrays_containing_optionals.py::test_ak_where_with_optional_unknowns - assert [None] == [1]
FAILED tests/test_3098_ak_where_with_arrays_containing_optionals.py::test_ak_where_with_optionals - assert [None, 5, None] == [1, 5, None]
FAILED tests/test_3098_ak_where_with_arrays_containing_optionals.py::test_ak_where_with_optionals_multidim - assert [[1, 2], None] == [[1, 2], [12, 13]]

I wonder if we can have the best of both worlds somehow when it comes to broadcasting here.

ikrommyd avatar Jun 28 '25 17:06 ikrommyd

Okay so all that came from this PR: https://github.com/scikit-hep/awkward/pull/3168 and you will notice that in the tests, there are some "Fails ATOW" and "# ATOW we get X" comments. Well awkward1 gives exactly those wrong answers. So indeed it's awkward1's ak.where that was problematic and it was the case for awkward2 as well until this PR fixed it about a year ago. This PR however makes ak.where do this type change so I wonder if we can improve on that. We should be getting correct answers without changing the type if possible.

Without this PR:

In [1]: import awkward as ak

In [2]: x = ak.Array([None, [True], None, [False]])

In [3]: y = ak.Array([None, [1], None, [2]])

In [4]: z = ak.Array([10, 11, 12, 13])

In [5]: ak.where(x, y, z).layout
Out[5]:
<IndexedOptionArray len='4'>
    <index><Index dtype='int64' len='4'>
        [-1  0 -1  1]
    </Index></index>
    <content><ListOffsetArray len='2'>
        <offsets><Index dtype='int64' len='3'>[0 1 2]</Index></offsets>
        <content><NumpyArray dtype='int64' len='2'>[ 1 13]</NumpyArray></content>
    </ListOffsetArray></content>
</IndexedOptionArray>

With this PR (and currently awkward main):

In [5]: ak.where(x, y, z).layout
Out[5]:
<ListOffsetArray len='4'>
    <offsets><Index dtype='int64' len='5'>
        [0 1 2 3 4]
    </Index></offsets>
    <content><ByteMaskedArray valid_when='false' len='4'>
        <mask><Index dtype='int8' len='4'>[1 0 1 0]</Index></mask>
        <content><NumpyArray dtype='int64' len='4'>[10  1 12 13]</NumpyArray></content>
    </ByteMaskedArray></content>
</ListOffsetArray>

I prefer this

In [1]: import awkward as ak

In [2]: x = ak.Array([None, [True], None, [False]])

In [3]: y = ak.Array([None, [1], None, [2]])

In [4]: z = ak.Array([10, 11, 12, 13])

In [5]: ak.where(x, y, z)
Out[5]: <Array [None, [1], None, [13]] type='4 * option[var * int64]'> 

a lot more than this

In [5]: ak.where(x, y, z)
Out[5]: <Array [[None], [1], [None], [13]] type='4 * var * ?int64'>

ikrommyd avatar Jun 28 '25 18:06 ikrommyd