pydap
pydap copied to clipboard
decode error for unicode charaters
maybe related to https://github.com/pydap/pydap/pull/152 and https://github.com/pydap/pydap/issues/164
Trying to get a netcdf data served via PyDap. The file in question works fine in a standard python console with direct access using python-netcdf4 while in pydap, on the web interface the das is not available and the apache log returns this error:
[Wed May 22 13:55:25.685392 2019] [wsgi:error] [pid 20625:tid 140168119965440] [client 157.249.114.74:44934] File "/usr/local/lib/python3.6/dist-packages/pydap/responses/das.py", line 44, in __iter__, referer: http://dap.metsis.met.no/
[Wed May 22 13:55:25.685402 2019] [wsgi:error] [pid 20625:tid 140168119965440] [client 157.249.114.74:44934] #yield line.encode('ascii'), referer: http://dap.metsis.met.no/
[Wed May 22 13:55:25.685429 2019] [wsgi:error] [pid 20625:tid 140168119965440] [client 157.249.114.74:44934] UnicodeEncodeError: 'ascii' codec can't encode character '\\xd8' in position 33: ordinal not in range(128), referer: http://dap.metsis.met.no/
a bad hack to fix the das .. is to add an exception and try to decode using utf-8 ... which now gave me a working page for the das but this doesn't fix the pydap.client ... as the error trying to laod such dataset is:
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-1-2bb713f8a88f> in <module>
1 from pydap.client import open_url
----> 2 dataset = open_url('http://dap.metsis.met.no/SN99938.nc')
/usr/local/lib/python3.7/dist-packages/pydap/client.py in open_url(url, application, session, output_grid, timeout, verify)
65 """
66 dataset = DAPHandler(url, application, session, output_grid,
---> 67 timeout=timeout, verify=verify).dataset
68
69 # attach server-side functions
/usr/local/lib/python3.7/dist-packages/pydap/handlers/dap.py in __init__(self, url, application, session, output_grid, timeout, verify)
61 verify=verify)
62 raise_for_status(r)
---> 63 das = safe_charset_text(r)
64
65 # build the dataset from the DDS and add attributes from the DAS
/usr/local/lib/python3.7/dist-packages/pydap/handlers/dap.py in safe_charset_text(r)
115 else:
116 r.charset = get_charset(r)
--> 117 return r.text
118
119
/usr/local/lib/python3.7/dist-packages/webob/response.py in _text__get(self)
620 decoding = self.charset or self.default_body_encoding
621 body = self.body
--> 622 return body.decode(decoding, self.unicode_errors)
623
624 def _text__set(self, value):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 94: ordinal not in range(128)
if i put a print statement in /usr/local/lib/python3.7/dist-packages/webob/response.py line 622 it tells me the decoding is set to ascii while to work in my case it should be utf-8 decoding is define few lines above by : decoding = self.charset or self.default_body_encoding so adding an other try/except to switch to utf-8 ... will work but this is a hack and most important .. this is happening on the client side... where I have no control on the pydap version used by a potential user do you have any suggestion?
I tried to manually set the r.charset
value to UTF-8
in src/pydap/handlers/dap.py
in DAPHandler()
and in get_charset()
with no luck.
As debug, I added a print statement in webob/response.py
to see which value is passed for decoding the response:
print('#######################')
print(self.charset)
And also after my hardcoded UTF-8
changes, it still prints 'ascii' -
In [1]: from pydap.client import open_url
In [2]: url = 'http://internal.link.to/SN99938.nc'
In [3]: dataset = open_url(url)
#######################
ascii
#######################
ascii
It is my understanding that self.charset
is set in PyDap .. so it looks like is not set properly or the manually set I did in are ignored.
The only way to bypass the error is to manually force the decoding to UTF-8 by replacing;
decoding = self.charset or self.default_body_encoding
with:
decoding='UTF-8'
A test file is available for debugging this issue at: https://epinux.com/index.php/s/3cixFyp7yktaaWL
Can you help with this? It is being a crucial set back for countries that use unicode characters in their netcdf metadata :(
Experiencing the same thing, for this URL: https://thredds.met.no/thredds/dodsC/meps25epsarchive/2017/10/29/meps_mbr0_pp_2_5km_20171029T00Z.nc
Left a comment at #162 with some more details.
This seemed to work for me, in pydap/handlers/dap.py
def get_charset(r):
charset = r.charset
if not charset:
- charset = 'ascii'
+ charset = 'utf-8'
return charset
@petejan did you also tried to serve the same file via pydap-server or did you just to read it using pydap-client?
@petejan did you also tried to serve the same file via pydap-server or did you just to read it using pydap-client?
I was using the pydap-client to open http://thredds.aodn.org.au/thredds/catalog/IMOS/ABOS/SOTS/2018/catalog.html?dataset=IMOS/ABOS/SOTS/2018/IMOS_ABOS-SOTS_COSTZ_20180801_SOFS_FV00_SOFS-7.5-2018-SBE37SMP-ODO-RS232-03715969-30m_END-20190327_C-20190606.nc
example
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from pydap.client import open_url
>>> dataset = open_url('http://thredds.aodn.org.au/thredds/dodsC/IMOS/ABOS/SOTS/2018/IMOS_ABOS-SOTS_COSTZ_20180801_SOFS_FV00_SOFS-7.5-2018-SBE37SMP-ODO-RS232-03715969-30m_END-20190327_C-20190606.nc.html')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydap/client.py", line 67, in open_url
timeout).dataset
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydap/handlers/dap.py", line 54, in __init__
raise_for_status(r)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pydap/net.py", line 34, in raise_for_status
detail=response.status+'\n'+response.text,
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/webob/response.py", line 622, in _text__get
return body.decode(decoding, self.unicode_errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
FYI also occurs on this file
import xarray as xr
ds = xr.open_dataset("https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p25deg/Best", engine="pydap")
UnicodeDecodeError Traceback (most recent call last)
/var/folders/rf/26llfhwd68x7cftb1z3h000w0000gp/T/ipykernel_837/4088958308.py in <module>
----> 1 ds = xr.open_dataset(url, engine="pydap")
2 ds
~/miniconda3/envs/main/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
495
496 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 497 backend_ds = backend.open_dataset(
498 filename_or_obj,
499 drop_variables=drop_variables,
~/miniconda3/envs/main/lib/python3.9/site-packages/xarray/backends/pydap_.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, session, lock)
137 )
138
--> 139 store = PydapDataStore.open(
140 filename_or_obj,
141 session=session,
~/miniconda3/envs/main/lib/python3.9/site-packages/xarray/backends/pydap_.py in open(cls, url, session)
91 def open(cls, url, session=None):
92
---> 93 ds = pydap.client.open_url(url, session=session)
94 return cls(ds)
95
~/miniconda3/envs/main/lib/python3.9/site-packages/pydap/client.py in open_url(url, application, session, output_grid, timeout)
64 never retrieve coordinate axes.
65 """
---> 66 dataset = DAPHandler(url, application, session, output_grid,
67 timeout).dataset
68
~/miniconda3/envs/main/lib/python3.9/site-packages/pydap/handlers/dap.py in __init__(self, url, application, session, output_grid, timeout)
62 if not r.charset:
63 r.charset = 'ascii'
---> 64 das = r.text
65
66 # build the dataset from the DDS and add attributes from the DAS
~/miniconda3/envs/main/lib/python3.9/site-packages/webob/response.py in _text__get(self)
620 decoding = self.charset or self.default_body_encoding
621 body = self.body
--> 622 return body.decode(decoding, self.unicode_errors)
623
624 def _text__set(self, value):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 69191: ordinal not in range(128)
I tried to manually set the
r.charset
value toUTF-8
insrc/pydap/handlers/dap.py
inDAPHandler()
and inget_charset()
with no luck.As debug, I added a print statement in
webob/response.py
to see which value is passed for decoding the response:print('#######################') print(self.charset)
And also after my hardcoded
UTF-8
changes, it still prints 'ascii' -In [1]: from pydap.client import open_url In [2]: url = 'http://internal.link.to/SN99938.nc' In [3]: dataset = open_url(url) ####################### ascii ####################### ascii
It is my understanding that
self.charset
is set in PyDap .. so it looks like is not set properly or the manually set I did in are ignored.The only way to bypass the error is to manually force the decoding to UTF-8 by replacing;
decoding = self.charset or self.default_body_encoding
with:
decoding='UTF-8'
maybe related to #152 and #164
Trying to get a netcdf data served via PyDap. The file in question works fine in a standard python console with direct access using python-netcdf4 while in pydap, on the web interface the das is not available and the apache log returns this error:
[Wed May 22 13:55:25.685392 2019] [wsgi:error] [pid 20625:tid 140168119965440] [client 157.249.114.74:44934] File "/usr/local/lib/python3.6/dist-packages/pydap/responses/das.py", line 44, in __iter__, referer: http://dap.metsis.met.no/ [Wed May 22 13:55:25.685402 2019] [wsgi:error] [pid 20625:tid 140168119965440] [client 157.249.114.74:44934] #yield line.encode('ascii'), referer: http://dap.metsis.met.no/ [Wed May 22 13:55:25.685429 2019] [wsgi:error] [pid 20625:tid 140168119965440] [client 157.249.114.74:44934] UnicodeEncodeError: 'ascii' codec can't encode character '\\xd8' in position 33: ordinal not in range(128), referer: http://dap.metsis.met.no/
a bad hack to fix the das .. is to add an exception and try to decode using utf-8 ... which now gave me a working page for the das but this doesn't fix the pydap.client ... as the error trying to laod such dataset is:
--------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-1-2bb713f8a88f> in <module> 1 from pydap.client import open_url ----> 2 dataset = open_url('http://dap.metsis.met.no/SN99938.nc') /usr/local/lib/python3.7/dist-packages/pydap/client.py in open_url(url, application, session, output_grid, timeout, verify) 65 """ 66 dataset = DAPHandler(url, application, session, output_grid, ---> 67 timeout=timeout, verify=verify).dataset 68 69 # attach server-side functions /usr/local/lib/python3.7/dist-packages/pydap/handlers/dap.py in __init__(self, url, application, session, output_grid, timeout, verify) 61 verify=verify) 62 raise_for_status(r) ---> 63 das = safe_charset_text(r) 64 65 # build the dataset from the DDS and add attributes from the DAS /usr/local/lib/python3.7/dist-packages/pydap/handlers/dap.py in safe_charset_text(r) 115 else: 116 r.charset = get_charset(r) --> 117 return r.text 118 119 /usr/local/lib/python3.7/dist-packages/webob/response.py in _text__get(self) 620 decoding = self.charset or self.default_body_encoding 621 body = self.body --> 622 return body.decode(decoding, self.unicode_errors) 623 624 def _text__set(self, value): UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 94: ordinal not in range(128)
if i put a print statement in /usr/local/lib/python3.7/dist-packages/webob/response.py line 622 it tells me the decoding is set to ascii while to work in my case it should be utf-8 decoding is define few lines above by : decoding = self.charset or self.default_body_encoding so adding an other try/except to switch to utf-8 ... will work but this is a hack and most important .. this is happening on the client side... where I have no control on the pydap version used by a potential user do you have any suggestion?
Have you solved this problem? I am getting the same error message: ++++++++ code +++++++++++ #from pydap.client import open_url #import xarray as xr #import time
thredds_url = 'https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1840/' # ORNL DAAC TDS OPeNDAP URL # for Daymet V4 Daily Files
before = time.time() cnt = 0 for g_name in granule_names: print(' GRANULE_NAME ---->', g_name) granule_dap = thredds_url + g_name.replace('Daymet_Daily_V4.','') print(granule_dap)
# Using pydap's open_url
thredds_ds = open_url(granule_dap)
# Xarray DataSet - opening dataset via remote OPeNDAP
ds = xr.open_dataset(xr.backends.PydapDataStore(thredds_ds), decode_coords="all")
temp=ds['prcp'].sel(x=slice(lccbounds.minx[0],lccbounds.maxx[0]), y=slice(lccbounds.maxy[0],lccbounds.miny[0]))
if cnt==0:
prcp = temp
else:
prcp = xr.concat([prcp, temp], dim="time")
cnt += 1
save to netcdf
prcp.to_netcdf(var + '_tdssubset.nc') print("Processing Time: ", time.time() - before, 'seconds') #Processing Time: 50.4509379863739 seconds
++++++++++ error message +++++++++++ GRANULE_NAME ----> Daymet_Daily_V4.daymet_v4_daily_na_prcp_2010.nc https://thredds.daac.ornl.gov/thredds/dodsC/ornldaac/1840/daymet_v4_daily_na_prcp_2010.nc
UnicodeDecodeError Traceback (most recent call last)
/tmp/ipykernel_4146/70333123.py in
~/bc_gov/lib/python3.8/site-packages/pydap/client.py in open_url(url, application, session, output_grid, timeout) 64 never retrieve coordinate axes. 65 """ ---> 66 dataset = DAPHandler(url, application, session, output_grid, 67 timeout).dataset 68
~/bc_gov/lib/python3.8/site-packages/pydap/handlers/dap.py in init(self, url, application, session, output_grid, timeout) 55 if not r.charset: 56 r.charset = 'ascii' ---> 57 dds = r.text 58 59 dasurl = urlunsplit((scheme, netloc, path + '.das', query, fragment))
~/bc_gov/lib/python3.8/site-packages/webob/response.py in _text__get(self) 620 decoding = self.charset or self.default_body_encoding 621 body = self.body --> 622 return body.decode(decoding, self.unicode_errors) 623 624 def _text__set(self, value):
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128)