iris
iris copied to clipboard
cube.collapsed fails with multi-dimensional string coordinates
Hi, I have a cube like that below, with with data from multi-ensemble member climate model runs covering different years. The 'Expt ID' coordinate contains the run ID corresponding to each ensemble member for each year. I get an error when I do cube.collapsed('year', iris.analysis.MEAN).
I did previously collapse the cube over a 'season' coordinate (since removed), where each season had three time values, so perhaps this issue only arises when an entire dimension is collapsed?
print cube
air_temperature / (K) (time: 30; Ens member: 15; latitude: 145; longitude: 192)
Dimension coordinates:
time x - - -
Ens member - x - -
latitude - - x -
longitude - - - x
Auxiliary coordinates:
season_year x - - -
year x - - -
Expt ID x x - -
cube_mean=cube.collapsed('year',iris.analysis.MEAN)
The full error message is given below. It seems that the problem is that iris tries joining the strings in 'Expt ID' into one long string, and then finds that this does not have the same size as the 'Ens member' dimension.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-13-43764f0e4a7d> in <module>()
----> 1 cube_seasmean.collapsed('year',iris.analysis.MEAN)
/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in collapsed(self, coords, aggregator, **kwargs)
3253 local_dims = [coord_dims.index(dim) for dim in
3254 dims_to_collapse if dim in coord_dims]
-> 3255 collapsed_cube.replace_coord(coord.collapsed(local_dims))
3256
3257 untouched_dims = sorted(untouched_dims)
/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in replace_coord(self, new_coord)
1181 self.add_dim_coord(new_coord, dims[0])
1182 else:
-> 1183 self.add_aux_coord(new_coord, dims)
1184
1185 for factory in self.aux_factories:
/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in add_aux_coord(self, coord, data_dims)
964 if self.coords(coord): # TODO: just fail on duplicate object
965 raise ValueError('Duplicate coordinates are not permitted.')
--> 966 self._add_unique_aux_coord(coord, data_dims)
967
968 def _check_multi_dim_metadata(self, metadata, data_dims):
/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in _add_unique_aux_coord(self, coord, data_dims)
996
997 def _add_unique_aux_coord(self, coord, data_dims):
--> 998 data_dims = self._check_multi_dim_metadata(coord, data_dims)
999 self._aux_coords_and_dims.append([coord, data_dims])
1000
/network/home/aopp/watson/anaconda2/envs/main/lib/python2.7/site-packages/iris/cube.pyc in _check_multi_dim_metadata(self, metadata, data_dims)
988 raise ValueError(msg.format(dim, self.shape[dim],
989 metadata.name(), i,
--> 990 metadata.shape[i]))
991 elif metadata.shape != (1,):
992 msg = 'Missing data dimensions for multi-valued {} {!r}'
ValueError: Unequal lengths. Cube dimension 0 => 15; metadata 'Expt ID' dimension 0 => 1.
A quick way of coming up with a solution is just to explicitly make a string that is the join of the individual strings, removing the string coord and re-adding the joined string as a scalar coord (I'm not good enough with iris to know if this would be very robust, but it seems to work for my case).
for coord in cube.aux_coords:
if coord.ndim>1 and coord.dtype.char=='S':
new_str='|'.join(coord.points.ravel())
new_coord=iris.coords.AuxCoord(new_str, attributes=coord.attributes, long_name=coord.long_name, standard_name=coord.standard_name, units=coord.units, var_name=coord.var_name)
cube.remove_coord(coord)
cube.add_aux_coord(new_coord)
cube_mean=cube_.collapsed('year', iris.analysis.MEAN) #now this works
It would be nice if iris would do something like this in cube.collapsed(). Even better would be a method that only collapses 'Expt ID' here along the dimension being collapsed, so the association of 'Expt ID' values with the 'Ens member' dimension would be maintained.
Even better would be a method that only collapses 'Expt ID' here along the dimension being collapsed, so the association of 'Expt ID' values with the 'Ens member' dimension would be maintained.
The aggregated_by
method has string handling that does that. So I would say it’s desirable to have consistent behaviour in collapsed
. I also think it should be relatively simple to implement.
The relevant handling for aggregated_by
looks like this:
https://github.com/SciTools/iris/blob/c9506e6a41282e91a27101905cc4e9d3cb866e4b/lib/iris/analysis/init.py#L2182-L2217
In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.
If this issue is still important to you, then please comment on this issue and the stale label will be removed.
Otherwise this issue will be automatically closed in 28 days time.
I believe this bug is very fixable, it just needs someone to find the time. So I say we leave this issue open.
I have proposed a fix for this at #4294.
In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.
If this issue is still important to you, then please comment on this issue and the stale label will be removed.
Otherwise this issue will be automatically closed in 28 days time.
I still think we should make this work but if we don't make it work we should at least raise a more decipherable error message.