netcdf4-python
netcdf4-python copied to clipboard
Inconsistent and problematic behavior in Variable indexing
There are a number of footguns, inconsistencies, and suspicious behaviors in how _StartCountStride handles various keys for __getitem__ and __setitem__. I found these when working on the type stubs.
I created this script to test a bunch of combinations. ~I only post the 1-D variable results below, as the 2-D results didn't any more information.~ (I added a 2D case below that provides some more information.) See the Notes column of the table for footnotes to each problem I found.
Note - I use the term "integral float" to refer to a float whose value is an integer, e.g. 1.0.
1-D
Data
arr1:
array([5, 6, 7])
var1[...]:
array([5, 6, 7])
Test results
| key type | key | ds[varname][key] result |
Notes |
|---|---|---|---|
| int | 0 |
array(5) |
|
| integral float | 0.0 |
array(5) |
Note: Allowing integral floats does make some sense to avoid obnoxious forced type conversions. |
| Non-integral float | 0.8 |
array(5) |
Footgun: A float key is cooerced to an integer, rather than being detected as a logic error. I think this should be prevented by something like the _is_int() check used as was done for VLEN in #757. |
| integer string | '1' |
array(6) |
Note: Ok, we allow integer strings for keys. That's weird, but convenient I guess. |
| integral float string | '1.0' |
IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices') |
|
| Non-integral float string | '1.5' |
IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices') |
|
| bool | True |
array(6) |
Footgun: A single boolean key for a dimensioned variable is treated like an integer 1. Booleans keys only make sense as a mask for one or more dimensions. |
| list of int | [0, 1] |
array([5, 6]) |
|
| list of bool (same shape as dimension) | [True, False, True] |
array([5, 7]) |
|
| list of bool (wrong shape) | [True, False] |
IndexError('\nBoolean array must have the same shape as the data along this dimension.') | |
| list of integral float | [0.0, 1.0] |
IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices') |
Inconsistent: Why do we allow float keys but not lists of floats? |
| list of non-integral float | [0.6, 1.7] |
ValueError('slicing expression exceeds the number of dimensions of the variable') | Suspicious: Why is this a ValueError about having the wrong number of dimensions in the key? |
| list of integer str | ['0', '1'] |
ValueError('slicing expression exceeds the number of dimensions of the variable') | Inconsistent/suspicious: We allow integer strings but not lists of integer strings, and the error is not an IndexError but a ValueError. |
| list of integral float str | ['0.0', '1.0'] |
IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices') |
|
| list of non-integral float str | ['0.6', '1.7'] |
IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices') |
|
| single-valued list of int | [1] |
array([6]) |
|
| single-valued list of integral float | [1.0] |
IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices') |
|
| single-valued list of non-integral float | [1.7] |
array(6) |
Inconsistent/suspicious: A list of float keys is allowed only if there's only one value and that value is not integral! And it returns a scalar instead of an array (compare to single-valued list of int). |
| single-valued list of int str | ['1'] |
array(6) |
Inconsistent/suspicious: A list of int str keys is allowed only if there's only one value. And it returns a scalar instead of an array (compare to single-valued list of int). |
| single-valued list of integral float str | ['1.0'] |
IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices') |
|
| single-valued list of non-integral float str | ['1.7'] |
IndexError('only integers, slices (:), ellipsis (...), and 1-d integer or boolean arrays are valid indices') |
2D case
It appears that a list of (non-integral!) floats or a list of strings of integers is interpreted as row, column indices, whereas a list of int is interpreted as multiple indexes for one dimension.
Data
var2[...]:
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])
Test and result
var2[[1, 2]]
# returns the second and third rows
# array([[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
# [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]])
var2[['1']]
# returns the second row
# array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
ds["var2"][['1', '2']]
# returns the third value in the second column
# array(22)
ds["var2"][[1.0, 2.0]]
# raises IndexError: only integers, slices (`:`), ellipsis (`...`), and 1-d integer or boolean arrays are valid indices
ds["var2"][[1.2, 2.2]]
# returns the third value in the second column
# array(22)