netcdf4-python Inconsistent and problematic behavior in Variable indexing

Inconsistent and problematic behavior in Variable indexing

Open RandallPittmanOrSt opened this issue 1 year ago • 0 comments

There are a number of footguns, inconsistencies, and suspicious behaviors in how _StartCountStride handles various keys for __getitem__ and __setitem__. I found these when working on the type stubs.

I created this script to test a bunch of combinations. ~I only post the 1-D variable results below, as the 2-D results didn't any more information.~ (I added a 2D case below that provides some more information.) See the Notes column of the table for footnotes to each problem I found.

Note - I use the term "integral float" to refer to a float whose value is an integer, e.g. 1.0.

1-D

Data

arr1:

array([5, 6, 7])

var1[...]:

array([5, 6, 7])

Test results

key type	key	`ds[varname][key]` result	Notes
int	`0`	`array(5)`
integral float	`0.0`	`array(5)`	Note: Allowing integral floats does make some sense to avoid obnoxious forced type conversions.
Non-integral float	`0.8`	`array(5)`	Footgun: A float key is cooerced to an integer, rather than being detected as a logic error. I think this should be prevented by something like the `_is_int()` check used as was done for VLEN in #757.
integer string	`'1'`	`array(6)`	Note: Ok, we allow integer strings for keys. That's weird, but convenient I guess.
integral float string	`'1.0'`	IndexError('only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices')
Non-integral float string	`'1.5'`	IndexError('only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices')
bool	`True`	`array(6)`	Footgun: A single boolean key for a dimensioned variable is treated like an integer `1`. Booleans keys only make sense as a mask for one or more dimensions.
list of int	`[0, 1]`	`array([5, 6])`
list of bool (same shape as dimension)	`[True, False, True]`	`array([5, 7])`
list of bool (wrong shape)	`[True, False]`	IndexError('\nBoolean array must have the same shape as the data along this dimension.')
list of integral float	`[0.0, 1.0]`	IndexError('only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices')	Inconsistent: Why do we allow float keys but not lists of floats?
list of non-integral float	`[0.6, 1.7]`	ValueError('slicing expression exceeds the number of dimensions of the variable')	Suspicious: Why is this a ValueError about having the wrong number of dimensions in the key?
list of integer str	`['0', '1']`	ValueError('slicing expression exceeds the number of dimensions of the variable')	Inconsistent/suspicious: We allow integer strings but not lists of integer strings, and the error is not an `IndexError` but a `ValueError`.
list of integral float str	`['0.0', '1.0']`	IndexError('only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices')
list of non-integral float str	`['0.6', '1.7']`	IndexError('only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices')
single-valued list of int	`[1]`	`array([6])`
single-valued list of integral float	`[1.0]`	IndexError('only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices')
single-valued list of non-integral float	`[1.7]`	`array(6)`	Inconsistent/suspicious: A list of float keys is allowed only if there's only one value and that value is not integral! And it returns a scalar instead of an array (compare to single-valued list of `int`).
single-valued list of int str	`['1']`	`array(6)`	Inconsistent/suspicious: A list of int str keys is allowed only if there's only one value. And it returns a scalar instead of an array (compare to single-valued list of `int`).
single-valued list of integral float str	`['1.0']`	IndexError('only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices')
single-valued list of non-integral float str	`['1.7']`	IndexError('only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices')

2D case

It appears that a list of (non-integral!) floats or a list of strings of integers is interpreted as row, column indices, whereas a list of int is interpreted as multiple indexes for one dimension.

Data

var2[...]:

array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

Test and result

var2[[1, 2]]
# returns the second and third rows
# array([[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
#        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])

var2[['1']]
# returns the second row
# array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

ds["var2"][['1', '2']]
# returns the third value in the second column
# array(22)

ds["var2"][[1.0, 2.0]]
# raises IndexError: only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices

ds["var2"][[1.2, 2.2]]
# returns the third value in the second column
# array(22)

Jul 12 '24 23:07 RandallPittmanOrSt

netcdf4-python netcdf4-python copied to clipboard

Inconsistent and problematic behavior in Variable indexing

1-D

Data

Test results

2D case

Data

Test and result

netcdf4-python
netcdf4-python copied to clipboard