netcdf4-python
netcdf4-python copied to clipboard
Inconsistency between writing and reading; can read using [][] or [,], but writing using [][] fails in silence
I had a bug in one of my scripts, which took me quite a while to debug. The reason for this was that one can read using either [][] or [,] methods, but it seems that when writing, [][] fails in silence, which is tricky to debug and find.
To illustrate, see the following script:
"""A simple illustration of the [][] vs [,] problem."""
import sys
import os
import numpy as np
import netCDF4 as nc4
print("python version: {}".format(sys.version))
print("nectCDF4 version: {}".format(nc4.__version__))
path = "./example.nc4"
number_of_time_entries = 3
data_0 = np.array([1.0, 2.0, 3.0])
data_1 = np.array([4.0, 5.0, 6.0])
def write_netCDF4(write_method="[][]"):
with nc4.Dataset(path, "w", format="NETCDF4") as nc4_fh:
nc4_fh.set_auto_mask(False)
_ = nc4_fh.createDimension('station', 2)
_ = nc4_fh.createDimension('time', number_of_time_entries)
observation = nc4_fh.createVariable('observation', 'f4', ('station', 'time'))
if write_method == "[][]":
# does not work
observation[0][:] = data_0
observation[1][:] = data_1
if write_method == "[,]":
# does work
observation[0, :] = data_0
observation[1, :] = data_1
def read_netCDF4(read_method="[][]"):
with nc4.Dataset(path, "r", format="NETCDF4") as nc4_fh:
print(" reading from file, result is:")
print(" {}".format(nc4_fh["observation"][0][:]))
print(" {}".format(nc4_fh["observation"][1][:]))
if (np.all(nc4_fh["observation"][0][:] == data_0) and np.all(nc4_fh["observation"][1][:] == data_1)):
print(" success reading using [][]")
else:
print(" failure reading using [][]")
if (np.any(nc4_fh["observation"][0, :] == data_0) and np.any(nc4_fh["observation"][1, :] == data_1)):
print(" success reading using [,]")
else:
print(" failure reading using [,]")
for write_method in ["[][]", "[,]"]:
print("write using method {}".format(write_method))
write_netCDF4(write_method=write_method)
read_netCDF4()
os.remove(path)
Which on my machine produces output:
python version: 3.8.2 (default, Jul 16 2020, 14:00:26)
[GCC 9.3.0]
nectCDF4 version: 1.5.3
write using method [][]
reading from file, result is:
[-- -- --]
[-- -- --]
failure reading using [][]
failure reading using [,]
write using method [,]
reading from file, result is:
[1. 2. 3.]
[4. 5. 6.]
success reading using [][]
success reading using [,]
I think it is quite tricky that a syntax that works for reading does not work for writing. Any idea how to fix this? If the syntax is not working, would it be possible to get either an exception, or an error message, or a warning message?
When you are reading, nc4_fh["observation"][0][:]
works because nc4_fh["observation"][0]
returns a numpy array, which is then sliced using [:]
. Writing is a different story. We can't support all the numpy slicing features, so there are bound to be things that work for reading but not writing. Having said that, it is disturbing that it doesn't fail, but returns the wrong answer. I will take a look and see what's going on.
Note that
observation[0] = data_0
observation[1] = data_1
does work as expected.
I completely agree that supporting all the ways to assign would be far too much hassle, so as you say I wrote this mostly focused on the 'if it is the wrong way, it should fail or raise an error' :) .
Turns out the the [][]
slicing syntax doesn't get forwarded to __setitem__
at all. Not sure why.
OK, here's what's happening with
observation[0][:] = data_0
observation[0]
forwards to __getitem__
and returns a fully masked array (since no data has yet been written to the variable). data_0 is then assigned to the masked array via the nump.ma __setitem__
. The netcdf4-python Variable__setitem__
is never called.
You could argue that this is the expected behavior - but I agree it is confusing. Problem is I don't know how to catch it and raise an exception.
Never mind my previous question. I misunderstood the above comment. But this implies it is numpy which silently fails, correct?
Edit: Looking at the source code, I see now what's happening. This is indeed tricky to catch. Hmm ...
numpy is not failing - it's just not returning what you expect.
Yeah, hence my edit. I looked at the source code after writing the comment.