iris icon indicating copy to clipboard operation
iris copied to clipboard

Cannot save non-ASCII characters to NetCDF

Open trexfeathers opened this issue 1 year ago • 6 comments

🐛 Bug Report

From @gavinevans

Attempting to save a Cube including a string AuxCoord with non-ASCII characters (i.e. Unicode characters) raises the following exception:

UnicodeEncodeError: 'ascii' codec can't encode character '\xe8' in position 0: ordinal not in range(128)

How To Reproduce

Steps to reproduce the behaviour:

import iris
from iris.coords import AuxCoord, DimCoord
from iris.cube import Cube

spot_index = DimCoord([0, 1], long_name='site_index', units=1)

station_name = AuxCoord(["Robièi", "Mühleberg"], long_name="station_name")
# This one works:
# station_name = AuxCoord(["Robiei", "Muhleberg"], long_name="station_name")

cube = Cube(
    [3, 4],
    dim_coords_and_dims=[(spot_index, 0)],
    aux_coords_and_dims=[(station_name, 0)]
)

iris.save(cube, "tmp.nc")

Expected behaviour

Should save with no exception (as happens when using the commented line above).

Environment

  • OS & Version: RHEL7
  • Iris Version: tested with v3.2.1.post0 and v3.4.0

Additional context

Related:

  • #4101
  • #4412

I think the fix will hinge on allowing for the extra bytes needed to store encoded Unicode characters. We currently divide the length in 4, which I think means we are always assuming a Unicode string can be converted to an ASCII one:

https://github.com/SciTools/iris/blob/fc302c9c08c292cb2075d2dd249bcbdfacf08da8/lib/iris/fileformats/netcdf/saver.py#L1881-L1883

Changing this could have loading consequences too?

Expand for traceback with Iris v3.4
Traceback (most recent call last):
  File ".../iris/lib/2023-01-03_gavin.py", line 17, in <module>
    iris.save(cube, "tmp.nc")
  File ".../iris/lib/iris/io/__init__.py", line 457, in save
    saver(source, target, **kwargs)
  File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 2754, in save
    sman.write(
  File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 755, in write
    self._add_aux_coords(cube, cf_var_cube, cube_dimensions)
  File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 1088, in _add_aux_coords
    return self._add_inner_related_vars(
  File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 1053, in _add_inner_related_vars
    cf_name = self._create_generic_cf_array_var(
  File ".../iris/lib/iris/fileformats/netcdf/saver.py", line 1917, in _create_generic_cf_array_var
    new_data[index_slice] = list(
UnicodeEncodeError: 'ascii' codec can't encode character '\xe8' in position 0: ordinal not in range(128)

trexfeathers avatar Jan 04 '23 11:01 trexfeathers