cf-python
cf-python copied to clipboard
Unable to index aggregated fields when one has a size 1 dimension (ValueError: 'dim1' is not in list)
Hi, I am (for the second time today, sorry!) asking for help with an odd issue I have been having.
Preface: I am trying to convert pp files to netcdf4 using cf-python. Since I am working with hourly model data N96, I can only extract 24 hour records of a single stash item, thus the need for aggregating many files.
The model handles the hourly fields oddly - the first hour of each day is 01:00, so to get the 00:00 I need to retrieve it from the previous day. This is less a problem and more an annoyance, but it does mean the very first 00:00 on the 1st of a year is actually stored under the 31st of the previous year.
In my case, I am trying to convert data from 2014, which means the very first hour is actually in a 31/12/2013 file. I have made up scripts which account for this, and select the files with overlaps to ensure I get all the hours, but the very first day is what is causing the problem. Unfortunately, when I try to aggregate the 00:00 hour (a separate file with a single hour) onto the rest of January, the aggregation works, but the field I get cannot be indexed at all, which means I cannot then select only hours that are in the correct month (to prevent overlaps).
For example:
f = cf.read(FILELIST)[0]
f[f.coord("time").month==1]
OR
f[0]
Both my month selection and even a simple indexing for [0] return a ValueError.
I have tried a number of things to combat this which I will outline below, but first I receive the same error every time:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-100-d8711d29ed7d> in <module>
----> 1 f[0]
~/.conda/envs/cf-2/lib/python3.9/site-packages/cf/field.py in __getitem__(self, indices)
477 new = new.roll(iaxis, shift)
478 else:
--> 479 new = self.copy()
480
481 # ------------------------------------------------------------
~/.conda/envs/cf-2/lib/python3.9/site-packages/cfdm/core/abstract/propertiesdata.py in copy(self, data)
185
186 """
--> 187 return type(self)(source=self, copy=True, _use_data=data)
188
189 def del_data(self, default=ValueError()):
~/.conda/envs/cf-2/lib/python3.9/site-packages/cf/field.py in __init__(self, properties, source, copy, _use_data)
369
370 """
--> 371 super().__init__(
372 properties=properties,
373 source=source,
~/.conda/envs/cf-2/lib/python3.9/site-packages/cfdm/field.py in __init__(self, properties, source, copy, _use_data)
141 """
142 # Initialise the new field with attributes and CF properties
--> 143 core.Field.__init__(
144 self,
145 properties=properties,
~/.conda/envs/cf-2/lib/python3.9/site-packages/cfdm/core/field.py in __init__(self, properties, source, copy, _use_data)
124
125 if data is not None and _use_data:
--> 126 self.set_data(data, data_axes, copy=copy)
127 elif data_axes is not None:
128 self.set_data_axes(data_axes)
~/.conda/envs/cf-2/lib/python3.9/site-packages/cfdm/decorators.py in inplace_wrapper(self, *args, **kwargs)
47 self.INPLACE_ENABLED_PLACEHOLDER = self.copy()
48
---> 49 processed_copy = operation_method(self, *args, **kwargs)
50
51 if is_inplace:
~/.conda/envs/cf-2/lib/python3.9/site-packages/cf/field.py in set_data(self, data, axes, set_axes, copy, inplace)
11513 ]
11514 if cyclic_axes:
> 11515 data.cyclic(cyclic_axes, True)
11516
11517 return f
~/.conda/envs/cf-2/lib/python3.9/site-packages/cf/data/data.py in cyclic(self, axes, iscyclic)
10554 data_axes = self._axes
10555
> 10556 old = set([data_axes.index(axis) for axis in cyclic_axes])
10557
10558 if axes is None:
~/.conda/envs/cf-2/lib/python3.9/site-packages/cf/data/data.py in <listcomp>(.0)
10554 data_axes = self._axes
10555
> 10556 old = set([data_axes.index(axis) for axis in cyclic_axes])
10557
10558 if axes is None:
ValueError: 'dim1' is not in list
I am not sure what "dim1" is (I assume it's used internally).
I have tried only using two files:
[<CF Field: id%UM_m01s50i156_vn1105(time(1), latitude(144), longitude(192)) kg m-2 s-1>,
<CF Field: id%UM_m01s50i156_vn1105(time(24), latitude(144), longitude(192)) kg m-2 s-1>]
(Here, I have used the unsqueeze option to intentionally ensure that there are three dimensions, it doesn't seem to make a difference to the error though!)
Oddly enough, when I aggregate the files, it generates a perfectly "fine" field:
cf.aggregate(f)
[<CF Field: id%UM_m01s50i156_vn1105(time(25), latitude(144), longitude(192)) kg m-2 s-1>]
But it cannot be indexed.
I have tried using cf-python 3.4.0, 3.10.0, 3.11.0 (and corresponding cfdm versions as installed using conda) None of them varied in behaviour - the error was the same.
When I pick files that are within one year, the error doesn't show up, it only does for this singleton file from the previous year.
As far as I can tell, the pp files themselves are not corrupted or broken, and I can extract a perfectly fine numpy array from each one, and the dimensions look alright, but just in case, here is the dump result on the small file:
-----------------------------------------------------------
Field: id%UM_m01s50i156_vn1105 (ncvar%UM_m01s50i156_vn1105)
-----------------------------------------------------------
Conventions = 'CF-1.8'
_FillValue = -1073741824.0
history = 'Converted from UM/PP by cf-python v3.10.0'
lbproc = '0'
lbtim = '11'
long_name = 'NO surf emissions (kg m-2 s-1)'
runid = 'aaaaa'
source = 'UM vn1105'
stash_code = '50156'
submodel = '1'
um_stash_source = 'm01s50i156'
units = 'kg m-2 s-1'
Data(time(1), latitude(144), longitude(192)) = [[[0.0, ..., 2.6443517514573855e-13]]] kg m-2 s-1
Cell Method: time(1): point
Domain Axis: latitude(144)
Domain Axis: longitude(192)
Domain Axis: time(1)
Dimension coordinate: time
axis = 'T'
calendar = 'gregorian'
standard_name = 'time'
units = 'days since 2014-1-1'
Data(time(1)) = [2014-01-01 00:00:00] gregorian
Dimension coordinate: latitude
axis = 'Y'
standard_name = 'latitude'
units = 'degrees_north'
Data(latitude(144)) = [-89.375, ..., 89.375] degrees_north
Bounds:units = 'degrees_north'
Bounds:Data(latitude(144), 2) = [[-90.0, ..., 90.0]] degrees_north
Dimension coordinate: longitude
axis = 'X'
standard_name = 'longitude'
units = 'degrees_east'
Data(longitude(192)) = [0.9375, ..., 359.0625] degrees_east
Bounds:units = 'degrees_east'
Bounds:Data(longitude(192), 2) = [[0.0, ..., 360.0]] degrees_east
This has pretty much stumped me - and I have no idea how to solve it. While I can simply ditch the first hour and everything works alright, you can probably imagine that isn't the most satisfying solution...
Does anyone have an idea of what might be causing this? (The ideal result is where I've simply messed something up!)
Since I'm working on JASMIN, I can happily share the two pp files for debugging if need be.
Hi Matthew,
Thanks for reporting this, and the other issue - I'll have a look tomorrow ....