netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

Inconsistent and problematic behavior in Variable indexing

Open RandallPittmanOrSt opened this issue 1 year ago • 0 comments

There are a number of footguns, inconsistencies, and suspicious behaviors in how _StartCountStride handles various keys for __getitem__ and __setitem__. I found these when working on the type stubs.

I created this script to test a bunch of combinations. ~I only post the 1-D variable results below, as the 2-D results didn't any more information.~ (I added a 2D case below that provides some more information.) See the Notes column of the table for footnotes to each problem I found.

Note - I use the term "integral float" to refer to a float whose value is an integer, e.g. 1.0.

1-D

Data

arr1:

array([5, 6, 7])

var1[...]:

array([5, 6, 7])

Test results

key type key ds[varname][key] result Notes
int 0 array(5)
integral float 0.0 array(5) Note: Allowing integral floats does make some sense to avoid obnoxious forced type conversions.
Non-integral float 0.8 array(5) Footgun: A float key is cooerced to an integer, rather than being detected as a logic error. I think this should be prevented by something like the _is_int() check used as was done for VLEN in #757.
integer string '1' array(6) Note: Ok, we allow integer strings for keys. That's weird, but convenient I guess.
integral float string '1.0' IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
Non-integral float string '1.5' IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
bool True array(6) Footgun: A single boolean key for a dimensioned variable is treated like an integer 1. Booleans keys only make sense as a mask for one or more dimensions.
list of int [0, 1] array([5, 6])
list of bool (same shape as dimension) [True, False, True] array([5, 7])
list of bool (wrong shape) [True, False] IndexError('\nBoolean array must have the same shape as the data along this dimension.')
list of integral float [0.0, 1.0] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices') Inconsistent: Why do we allow float keys but not lists of floats?
list of non-integral float [0.6, 1.7] ValueError('slicing expression exceeds the number of dimensions of the variable') Suspicious: Why is this a ValueError about having the wrong number of dimensions in the key?
list of integer str ['0', '1'] ValueError('slicing expression exceeds the number of dimensions of the variable') Inconsistent/suspicious: We allow integer strings but not lists of integer strings, and the error is not an IndexError but a ValueError.
list of integral float str ['0.0', '1.0'] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
list of non-integral float str ['0.6', '1.7'] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
single-valued list of int [1] array([6])
single-valued list of integral float [1.0] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
single-valued list of non-integral float [1.7] array(6) Inconsistent/suspicious: A list of float keys is allowed only if there's only one value and that value is not integral! And it returns a scalar instead of an array (compare to single-valued list of int).
single-valued list of int str ['1'] array(6) Inconsistent/suspicious: A list of int str keys is allowed only if there's only one value. And it returns a scalar instead of an array (compare to single-valued list of int).
single-valued list of integral float str ['1.0'] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')
single-valued list of non-integral float str ['1.7'] IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices')

2D case

It appears that a list of (non-integral!) floats or a list of strings of integers is interpreted as row, column indices, whereas a list of int is interpreted as multiple indexes for one dimension.

Data

var2[...]:

array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

Test and result

var2[[1, 2]]
# returns the second and third rows
# array([[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
#        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

var2[['1']]
# returns the second row
# array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

ds["var2"][['1', '2']]
# returns the third value in the second column
# array(22)

ds["var2"][[1.0, 2.0]]
# raises IndexError: only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices

ds["var2"][[1.2, 2.2]]
# returns the third value in the second column
# array(22)

RandallPittmanOrSt avatar Jul 12 '24 23:07 RandallPittmanOrSt