netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

Error opening netcdf in path with "special" characters

Open itati01 opened this issue 5 years ago • 11 comments

Hi,

I could not open a netcdf file whose path contained a German umlaut in Python 3 (FileNotFoundError: [Errno 2] No such file or directory). A workaround was to change the folder and use only the filename:

import netCDF4 as nc
s = "C:/path_with_ä/test.nc"
ncf = nc.Dataset(s) # error
import os
path, fn = os.path.split(s)
os.chdir(path)
ncf = nc.Dataset(fn) # worked

I tested v1.5.1.2 under Python 3.7.2 (64bit, Win10) as installed via conda-forge. Interestingly, using a unicode string in Python 2 worked well with netcdf4 v1.4.1.

itati01 avatar Jun 24 '19 20:06 itati01

This works for me in Python 3.6

from netCDF4 import Dataset
filename = '\xc3\xbc.nc'
nc = Dataset(filename, 'w')
nc.close()

jswhit avatar Jun 26 '19 13:06 jswhit

Works for me as well. However, the file shows up as "ü.nc" in Windows Explorer but "ü.nc" under Linux (and in Win Explorer, I used Ubuntu 18.04 via WSL).

Now, filename = "ää.nc" results in "ää.nc" under Windows but "ää.nc" under Linux. Simulating the behaviour above, I used filename = "äää/aaa.nc" which results in an error under Windows (Errno 13: Permission denied) while everything is fine under Linux (the path shows up correctly in Win Explorer as well). os.makedirs("äää") works as expected.

itati01 avatar Jun 26 '19 14:06 itati01

Not a Unicode expert, but netcdf4-python uses utf-8 encoding by default (can be changed with the encoding Dataset kwarg). Maybe Windows uses a different encoding?

jswhit avatar Jun 26 '19 14:06 jswhit

This is related to https://github.com/Unidata/netcdf4-python/issues/686.

There is actually a test for this for windows (tst_filepath.py). I suggest you try using encoding=sys.getfilesystemencoding().

jswhit avatar Jun 26 '19 14:06 jswhit

Looks indeed like an unicode issue, although I am far from being an expert. sys.getfilesystemencoding() returns "utf-8" in Python 3, under Linux and Windows.

In Python 2, using filename = u"ää.nc" is working under Windows but filename = u"ää/ää.nc only if the folder "ää" already exists, e.g.

import os
filename = u'ää/ää.nc'
path, fn = os.path.split(filename)
if not os.path.exists(path):
    os.makedirs(path)    # os handles non-ASCII characters correctly
from netCDF4 import Dataset
nc = Dataset(filename, 'w') # fn is also fine
nc.close()

The path names also appear correctly in Win Explorer. sys.getfilesystemencoding() returns "mbcs" here.

itati01 avatar Jun 26 '19 15:06 itati01

Is the problem resolved for python 3 on windows? I'm not clear on what works and what doesn't work.

jswhit avatar Jun 27 '19 15:06 jswhit

Sorry for the confusion. No, the problem is not solved for Python 3 on Win. Python 2 on Win partly and Python 3 on Linux fully work. The unicode issues on Win seem to result in wrong (Py 3) albeit valid file names (your example) but invalid folder names (Py 2+3, my examples).

itati01 avatar Jun 27 '19 17:06 itati01

OK, thanks for the clarification. Not having access to Windows I'm not sure where to go from here. One question that comes to mind is whether the same issue arises if you try to open a text file in Windows (independent of netcdf4-python)?

jswhit avatar Jun 27 '19 18:06 jswhit

For reference

https://github.com/h5py/h5py/issues/839

Not sure if this is related or not, but it's a nice discussion of the general problem.

jswhit avatar Jun 28 '19 13:06 jswhit

Also

https://forum.hdfgroup.org/t/non-english-characters-in-hdf5-file-name/4627/3

Seems clear that unicode filenames are not fully supported in HDF5 on windows as of yet.

jswhit avatar Jun 29 '19 15:06 jswhit

Thanks for the link to the interesting discussion. So, let's hope that they might fix this issue at some time. By the way, creating a folder and writing to a new text file with os.makedirs() and write() works as expected.

itati01 avatar Jul 01 '19 11:07 itati01