awkward icon indicating copy to clipboard operation
awkward copied to clipboard

ar != None unexpected behaviour

Open andrzejnovak opened this issue 5 years ago • 1 comments

Using ar != None results in an unexpected mask

Consider:

a = ak.Array([None, 2, None, 2, None])
b = np.array([None, 2, None, 2, None])
a, b
>>> (<Array [None, 2, None, 2, None] type='5 * ?int64'>,
 array([None, 2, None, 2, None], dtype=object))
a != None
>>> <Array [None, True, None, True, None] type='5 * ?bool'>

b != None
>>> array([False,  True, False,  True, False])

Expected behaviour:

a != None
>>><Array [False, True, False, True, False] type='5 * ?bool'>

such that

a[a != None]
>>> <Array [2, 2] type='2 * ?int64'>,

andrzejnovak avatar Oct 13 '20 15:10 andrzejnovak

There's a whole discussion of this issue, starting at https://github.com/scikit-hep/awkward-1.0/issues/490#issuecomment-712250246

When I wrote it there, I thought I was narrowly saving it from oblivion because I only remembered hearing it on Slack. I could have just pointed to this issue. Anyway, the cross-reference will be useful in solving it.

jpivarski avatar Oct 20 '20 03:10 jpivarski

(I'm going through old issues, deciding what to do with them.)

I think that the array == None and array != None behavior should not be changed. In the examples you presented, @andrzejnovak, the results look non-intuitive, but they're following a rule that applies to all mathematical functions, including == and !=, and I think it would be dangerous to have exceptions to that behavior. If we made

ak.Array([1, 2, None, 3, None, 4]) == None

return

ak.Array([False, False, True, False, True, False])

then someone else's use-case might break because they were assuming that == and != would act like all other mathematical functions. Namely,

ak.Array([1, 2, None, 3, None, 4]) + 10

returns

ak.Array([11, 12, None, 13, None, 14])

That is, the scalar 10 broadcasts and the None values pass through any mathematical operation. When applied to == (same for !=), the expected result of

ak.Array([1, 2, None, 3, None, 4]) == None

would be

ak.Array([False, False, None, False, None, False])

because each integer == None is False and each missing value passes through.

@agoose77, do you concur? If you think there is something that we should do that would benefit all use-cases, reopen the issue. Thanks!

jpivarski avatar Nov 10 '22 21:11 jpivarski

@jpivarski I had the same thought. If we allow x == None, then we run into the problem that x == None has a different type to x == 10 (logically, unless we used an UnmaskedArray to keep the option type. Then, what about x == [None]? Maybe we'd allow that, but then what about x == [None] vs x == [None, 1]? It seems like a can of worms!

agoose77 avatar Nov 10 '22 21:11 agoose77