cf-python icon indicating copy to clipboard operation
cf-python copied to clipboard

Unable to index aggregated fields when one has a size 1 dimension (ValueError: 'dim1' is not in list)

Open matthew-shin opened this issue 4 years ago • 1 comments

Hi, I am (for the second time today, sorry!) asking for help with an odd issue I have been having.

Preface: I am trying to convert pp files to netcdf4 using cf-python. Since I am working with hourly model data N96, I can only extract 24 hour records of a single stash item, thus the need for aggregating many files.

The model handles the hourly fields oddly - the first hour of each day is 01:00, so to get the 00:00 I need to retrieve it from the previous day. This is less a problem and more an annoyance, but it does mean the very first 00:00 on the 1st of a year is actually stored under the 31st of the previous year.

In my case, I am trying to convert data from 2014, which means the very first hour is actually in a 31/12/2013 file. I have made up scripts which account for this, and select the files with overlaps to ensure I get all the hours, but the very first day is what is causing the problem. Unfortunately, when I try to aggregate the 00:00 hour (a separate file with a single hour) onto the rest of January, the aggregation works, but the field I get cannot be indexed at all, which means I cannot then select only hours that are in the correct month (to prevent overlaps).

For example:

f = cf.read(FILELIST)[0]
f[f.coord("time").month==1]
OR
f[0]

Both my month selection and even a simple indexing for [0] return a ValueError.

I have tried a number of things to combat this which I will outline below, but first I receive the same error every time:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-100-d8711d29ed7d> in <module>
----> 1 f[0]

~/.conda/envs/cf-2/lib/python3.9/site-packages/cf/field.py in __getitem__(self, indices)
    477                 new = new.roll(iaxis, shift)
    478         else:
--> 479             new = self.copy()
    480
    481         # ------------------------------------------------------------

~/.conda/envs/cf-2/lib/python3.9/site-packages/cfdm/core/abstract/propertiesdata.py in copy(self, data)
    185
    186         """
--> 187         return type(self)(source=self, copy=True, _use_data=data)
    188
    189     def del_data(self, default=ValueError()):

~/.conda/envs/cf-2/lib/python3.9/site-packages/cf/field.py in __init__(self, properties, source, copy, _use_data)
    369
    370         """
--> 371         super().__init__(
    372             properties=properties,
    373             source=source,

~/.conda/envs/cf-2/lib/python3.9/site-packages/cfdm/field.py in __init__(self, properties, source, copy, _use_data)
    141         """
    142         # Initialise the new field with attributes and CF properties
--> 143         core.Field.__init__(
    144             self,
    145             properties=properties,

~/.conda/envs/cf-2/lib/python3.9/site-packages/cfdm/core/field.py in __init__(self, properties, source, copy, _use_data)
    124
    125         if data is not None and _use_data:
--> 126             self.set_data(data, data_axes, copy=copy)
    127         elif data_axes is not None:
    128             self.set_data_axes(data_axes)

~/.conda/envs/cf-2/lib/python3.9/site-packages/cfdm/decorators.py in inplace_wrapper(self, *args, **kwargs)
     47                 self.INPLACE_ENABLED_PLACEHOLDER = self.copy()
     48
---> 49             processed_copy = operation_method(self, *args, **kwargs)
     50
     51             if is_inplace:

~/.conda/envs/cf-2/lib/python3.9/site-packages/cf/field.py in set_data(self, data, axes, set_axes, copy, inplace)
  11513                 ]
  11514                 if cyclic_axes:
> 11515                     data.cyclic(cyclic_axes, True)
  11516
  11517         return f

~/.conda/envs/cf-2/lib/python3.9/site-packages/cf/data/data.py in cyclic(self, axes, iscyclic)
  10554         data_axes = self._axes
  10555
> 10556         old = set([data_axes.index(axis) for axis in cyclic_axes])
  10557
  10558         if axes is None:

~/.conda/envs/cf-2/lib/python3.9/site-packages/cf/data/data.py in <listcomp>(.0)
  10554         data_axes = self._axes
  10555
> 10556         old = set([data_axes.index(axis) for axis in cyclic_axes])
  10557
  10558         if axes is None:

ValueError: 'dim1' is not in list

I am not sure what "dim1" is (I assume it's used internally).

I have tried only using two files:

[<CF Field: id%UM_m01s50i156_vn1105(time(1), latitude(144), longitude(192)) kg m-2 s-1>,
 <CF Field: id%UM_m01s50i156_vn1105(time(24), latitude(144), longitude(192)) kg m-2 s-1>]

(Here, I have used the unsqueeze option to intentionally ensure that there are three dimensions, it doesn't seem to make a difference to the error though!)

Oddly enough, when I aggregate the files, it generates a perfectly "fine" field:

cf.aggregate(f)
[<CF Field: id%UM_m01s50i156_vn1105(time(25), latitude(144), longitude(192)) kg m-2 s-1>]

But it cannot be indexed.

I have tried using cf-python 3.4.0, 3.10.0, 3.11.0 (and corresponding cfdm versions as installed using conda) None of them varied in behaviour - the error was the same.

When I pick files that are within one year, the error doesn't show up, it only does for this singleton file from the previous year.

As far as I can tell, the pp files themselves are not corrupted or broken, and I can extract a perfectly fine numpy array from each one, and the dimensions look alright, but just in case, here is the dump result on the small file:

-----------------------------------------------------------
Field: id%UM_m01s50i156_vn1105 (ncvar%UM_m01s50i156_vn1105)
-----------------------------------------------------------
Conventions = 'CF-1.8'
_FillValue = -1073741824.0
history = 'Converted from UM/PP by cf-python v3.10.0'
lbproc = '0'
lbtim = '11'
long_name = 'NO surf emissions (kg m-2 s-1)'
runid = 'aaaaa'
source = 'UM vn1105'
stash_code = '50156'
submodel = '1'
um_stash_source = 'm01s50i156'
units = 'kg m-2 s-1'

Data(time(1), latitude(144), longitude(192)) = [[[0.0, ..., 2.6443517514573855e-13]]] kg m-2 s-1

Cell Method: time(1): point

Domain Axis: latitude(144)
Domain Axis: longitude(192)
Domain Axis: time(1)

Dimension coordinate: time
    axis = 'T'
    calendar = 'gregorian'
    standard_name = 'time'
    units = 'days since 2014-1-1'
    Data(time(1)) = [2014-01-01 00:00:00] gregorian

Dimension coordinate: latitude
    axis = 'Y'
    standard_name = 'latitude'
    units = 'degrees_north'
    Data(latitude(144)) = [-89.375, ..., 89.375] degrees_north
    Bounds:units = 'degrees_north'
    Bounds:Data(latitude(144), 2) = [[-90.0, ..., 90.0]] degrees_north

Dimension coordinate: longitude
    axis = 'X'
    standard_name = 'longitude'
    units = 'degrees_east'
    Data(longitude(192)) = [0.9375, ..., 359.0625] degrees_east
    Bounds:units = 'degrees_east'
    Bounds:Data(longitude(192), 2) = [[0.0, ..., 360.0]] degrees_east


This has pretty much stumped me - and I have no idea how to solve it. While I can simply ditch the first hour and everything works alright, you can probably imagine that isn't the most satisfying solution...

Does anyone have an idea of what might be causing this? (The ideal result is where I've simply messed something up!)

Since I'm working on JASMIN, I can happily share the two pp files for debugging if need be.

matthew-shin avatar Oct 13 '21 17:10 matthew-shin

Hi Matthew,

Thanks for reporting this, and the other issue - I'll have a look tomorrow ....

davidhassell avatar Oct 13 '21 18:10 davidhassell