netcdf4-python
netcdf4-python copied to clipboard
Error opening netcdf in path with "special" characters
Hi,
I could not open a netcdf file whose path contained a German umlaut in Python 3 (FileNotFoundError: [Errno 2] No such file or directory). A workaround was to change the folder and use only the filename:
import netCDF4 as nc
s = "C:/path_with_ä/test.nc"
ncf = nc.Dataset(s) # error
import os
path, fn = os.path.split(s)
os.chdir(path)
ncf = nc.Dataset(fn) # worked
I tested v1.5.1.2 under Python 3.7.2 (64bit, Win10) as installed via conda-forge. Interestingly, using a unicode string in Python 2 worked well with netcdf4 v1.4.1.
This works for me in Python 3.6
from netCDF4 import Dataset
filename = '\xc3\xbc.nc'
nc = Dataset(filename, 'w')
nc.close()
Works for me as well. However, the file shows up as "ü.nc" in Windows Explorer but "ü.nc" under Linux (and in Win Explorer, I used Ubuntu 18.04 via WSL).
Now, filename = "ää.nc"
results in "ää.nc" under Windows but "ää.nc" under Linux. Simulating the behaviour above, I used filename = "äää/aaa.nc"
which results in an error under Windows (Errno 13: Permission denied) while everything is fine under Linux (the path shows up correctly in Win Explorer as well). os.makedirs("äää")
works as expected.
Not a Unicode expert, but netcdf4-python uses utf-8 encoding by default (can be changed with the encoding
Dataset kwarg). Maybe Windows uses a different encoding?
This is related to https://github.com/Unidata/netcdf4-python/issues/686.
There is actually a test for this for windows (tst_filepath.py). I suggest you try using encoding=sys.getfilesystemencoding()
.
Looks indeed like an unicode issue, although I am far from being an expert. sys.getfilesystemencoding()
returns "utf-8" in Python 3, under Linux and Windows.
In Python 2, using filename = u"ää.nc"
is working under Windows but filename = u"ää/ää.nc
only if the folder "ää" already exists, e.g.
import os
filename = u'ää/ää.nc'
path, fn = os.path.split(filename)
if not os.path.exists(path):
os.makedirs(path) # os handles non-ASCII characters correctly
from netCDF4 import Dataset
nc = Dataset(filename, 'w') # fn is also fine
nc.close()
The path names also appear correctly in Win Explorer. sys.getfilesystemencoding()
returns "mbcs" here.
Is the problem resolved for python 3 on windows? I'm not clear on what works and what doesn't work.
Sorry for the confusion. No, the problem is not solved for Python 3 on Win. Python 2 on Win partly and Python 3 on Linux fully work. The unicode issues on Win seem to result in wrong (Py 3) albeit valid file names (your example) but invalid folder names (Py 2+3, my examples).
OK, thanks for the clarification. Not having access to Windows I'm not sure where to go from here. One question that comes to mind is whether the same issue arises if you try to open a text file in Windows (independent of netcdf4-python)?
For reference
https://github.com/h5py/h5py/issues/839
Not sure if this is related or not, but it's a nice discussion of the general problem.
Also
https://forum.hdfgroup.org/t/non-english-characters-in-hdf5-file-name/4627/3
Seems clear that unicode filenames are not fully supported in HDF5 on windows as of yet.
Thanks for the link to the interesting discussion. So, let's hope that they might fix this issue at some time. By the way, creating a folder and writing to a new text file with os.makedirs() and write() works as expected.