iris
iris copied to clipboard
Cannot save non-ASCII characters to NetCDF
🐛 Bug Report
From @gavinevans
Attempting to save a Cube
including a string AuxCoord
with non-ASCII characters (i.e. Unicode characters) raises the following exception:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe8' in position 0: ordinal not in range(128)
How To Reproduce
Steps to reproduce the behaviour:
import iris
from iris.coords import AuxCoord, DimCoord
from iris.cube import Cube
spot_index = DimCoord([0, 1], long_name='site_index', units=1)
station_name = AuxCoord(["Robièi", "Mühleberg"], long_name="station_name")
# This one works:
# station_name = AuxCoord(["Robiei", "Muhleberg"], long_name="station_name")
cube = Cube(
[3, 4],
dim_coords_and_dims=[(spot_index, 0)],
aux_coords_and_dims=[(station_name, 0)]
)
iris.save(cube, "tmp.nc")
Expected behaviour
Should save with no exception (as happens when using the commented line above).
Environment
- OS & Version: RHEL7
- Iris Version: tested with
v3.2.1.post0
andv3.4.0
Additional context
Related:
- #4101
- #4412
I think the fix will hinge on allowing for the extra bytes needed to store encoded Unicode characters. We currently divide the length in 4, which I think means we are always assuming a Unicode string can be converted to an ASCII one:
https://github.com/SciTools/iris/blob/fc302c9c08c292cb2075d2dd249bcbdfacf08da8/lib/iris/fileformats/netcdf/saver.py#L1881-L1883
Changing this could have loading consequences too?
Expand for traceback with Iris v3.4
Traceback (most recent call last):
File ".../iris/lib/2023-01-03_gavin.py", line 17, in <module>
iris.save(cube, "tmp.nc")
File ".../iris/lib/iris/io/__init__.py", line 457, in save
saver(source, target, **kwargs)
File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 2754, in save
sman.write(
File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 755, in write
self._add_aux_coords(cube, cf_var_cube, cube_dimensions)
File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 1088, in _add_aux_coords
return self._add_inner_related_vars(
File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 1053, in _add_inner_related_vars
cf_name = self._create_generic_cf_array_var(
File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 1917, in _create_generic_cf_array_var
new_data[index_slice] = list(
UnicodeEncodeError: 'ascii' codec can't encode character '\xe8' in position 0: ordinal not in range(128)