UnicodeEncodeError: 'ascii' codec can't encode.... (includes test dataset)
@amercader we (the civic hackers pushing the Bulgarian opendata portal) are closely following your work. And thank you. For the datapusher and the other tools you're working on.
We have a show-stopper problem for using the datapusher and it's quite common. It could be the data. It might be strangely formatted or simply because it's cyrillic. Please give us a hit.
This is the dataset the error occurs: http://opendata.obshtestvo.bg/dataset/spisak-na-razprostranitelite-ne-vinetni-stikeri
This is our staging server. You can play about and not worry about data or crushing.
This is what we get in the DataStore tab:
Error: CKAN DataStore bad response. Status code: 500 Internal Server Error. At: http://opendata.obshtestvo.bg/api/3/action/datastore_create.
HTTP status code: 500
Response: <html> <head> <title>Server Error</title> </head> <body> <h1>Server Error</h1> An internal server error occurred </body> </html>
Requested URL: http://opendata.obshtestvo.bg/api/3/action/datastore_create
with:
Fetching from: http://opendata.obshtestvo.bg/dataset/08b276a7-915f-43aa-babc-9f0a9a4f7fc0/resource/642392b2-9b96-42bd-9f16-e9bce274c308/download/spisak-na-razprostranitelite-na-vinetni-stikeri20150630.csv
Btw, what is "Determined headers and types". It's giving also giving us this:
level
INFO
timestamp
2016-01-28T21:38:43.591342
module
jobs
funcName
push_to_datastore
lineno
377
message
Determined headers and types: [{'type': u'text', 'id': u'690,"\u0411\u041f 6007 \u041a\u044a\u0440\u0434\u0436\u0430\u043b\u0438 7","\u041a\u044a\u0440\u0434\u0436\u0430\u043b\u0438","\u041a\u044a\u0440\u0434\u0436\u0430\u043b\u0438","\u041a\u044a\u0440\u0434\u0436\u0430\u043b\u0438 7 \u043a\u0432. ""\u0412\u044a\u0437\u0440\u043e\u0436\u0434\u0435\u043d\u0446\u0438"" \u0431\u043b. 2","0361/65582 ","8-12;"13-16.30""'}, {'type': u'text', 'id': u'\u043d\u0435 \u0440\u0430\u0431\u043e\u0442\u0438""'}, {'type': u'text', 'id': u'\u043d\u0435 \u0440\u0430\u0431\u043e\u0442\u0438""";'}]
The escaped version of the "Determined headers and types" is:
[{'id': '690,"БП 6007 Кърджали 7","Кърджали","Кърджали","Кърджали 7 кв. ""Възрожденци"" бл. 2","0361/65582 ","8-12;"13-16.30""', 'type': 'text'}, {'id': 'не работи""', 'type': 'text'}, {'id': 'не работи""";', 'type': 'text'}]
Seems like the CSV is encoded in Windwows-1251. Is this the problem? Interesting thing is that that data in "Determined headers and types" is actually line number 693. Why is this line determined as headers?
This is the stack:
URL: http://opendata.obshtestvo.bg/api/3/action/datastore_create
File '/usr/lib/ckan/default/lib/python2.7/site-packages/weberror/errormiddleware.py', line 171 in __call__
app_iter = self.application(environ, sr_checker)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/webob/dec.py', line 147 in __call__
resp = self.call_func(req, *args, **self.kwargs)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/webob/dec.py', line 208 in call_func
return self.func(req, *args, **kwargs)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/fanstatic/publisher.py', line 234 in __call__
return request.get_response(self.app)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/webob/request.py', line 1053 in get_response
application, catch_exc_info=False)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/webob/request.py', line 1022 in call_application
app_iter = application(self.environ, start_response)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/webob/dec.py', line 147 in __call__
resp = self.call_func(req, *args, **self.kwargs)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/webob/dec.py', line 208 in call_func
return self.func(req, *args, **kwargs)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/fanstatic/injector.py', line 54 in __call__
response = request.get_response(self.app)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/webob/request.py', line 1053 in get_response
application, catch_exc_info=False)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/webob/request.py', line 1022 in call_application
app_iter = application(self.environ, start_response)
File '/usr/lib/ckan/default/src/ckan/ckan/config/middleware.py', line 389 in inner
result = application(environ, start_response)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/beaker/middleware.py', line 73 in __call__
return self.app(environ, start_response)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/beaker/middleware.py', line 155 in __call__
return self.wrap_app(environ, session_start_response)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/routes/middleware.py', line 131 in __call__
response = self.app(environ, start_response)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/pylons/wsgiapp.py', line 125 in __call__
response = self.dispatch(controller, environ, start_response)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/pylons/wsgiapp.py', line 324 in dispatch
return controller(environ, start_response)
File '/usr/lib/ckan/default/src/ckan/ckan/controllers/api.py', line 70 in __call__
return base.BaseController.__call__(self, environ, start_response)
File '/usr/lib/ckan/default/src/ckan/ckan/lib/base.py', line 337 in __call__
res = WSGIController.__call__(self, environ, start_response)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/pylons/controllers/core.py', line 221 in __call__
response = self._dispatch_call()
File '/usr/lib/ckan/default/lib/python2.7/site-packages/pylons/controllers/core.py', line 172 in _dispatch_call
response = self._inspect_call(func)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/pylons/controllers/core.py', line 107 in _inspect_call
result = self._perform_call(func, args)
File '/usr/lib/ckan/default/lib/python2.7/site-packages/pylons/controllers/core.py', line 60 in _perform_call
return func(**args)
File '/usr/lib/ckan/default/src/ckan/ckan/controllers/api.py', line 205 in action
result = function(context, request_data)
File '/usr/lib/ckan/default/src/ckan/ckan/logic/__init__.py', line 416 in wrapped
result = _action(context, data_dict, **kw)
File '/usr/lib/ckan/default/src/ckan/ckanext/datastore/logic/action.py', line 141 in datastore_create
result = db.create(context, data_dict)
File '/usr/lib/ckan/default/src/ckan/ckanext/datastore/db.py', line 1071 in create
create_table(context, data_dict)
File '/usr/lib/ckan/default/src/ckan/ckanext/datastore/db.py', line 306 in create_table
check_fields(context, supplied_fields)
File '/usr/lib/ckan/default/src/ckan/ckanext/datastore/db.py', line 273 in check_fields
field['id'])]
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-6: ordinal not in range(128)
CGI Variables
-------------
CKAN_CURRENT_URL: '/api/3/action/datastore_create'
CKAN_LANG: 'en'
CKAN_LANG_IS_DEFAULT: True
CONTENT_LENGTH: '242477'
CONTENT_TYPE: 'application/json; charset=utf-8'
DOCUMENT_ROOT: '/var/www'
GATEWAY_INTERFACE: 'CGI/1.1'
HTTP_ACCEPT: '*/*'
HTTP_ACCEPT_ENCODING: 'gzip, deflate'
HTTP_AUTHORIZATION: 'e2067dcf-c6d5-4c3b-9ee9-f61729daf378'
HTTP_CONNECTION: 'close'
HTTP_HOST: 'opendata.obshtestvo.bg'
HTTP_USER_AGENT: 'python-requests/2.7.0 CPython/2.7.3 Linux/3.16.0-4-amd64'
HTTP_X_FORWARDED_FOR: '10.255.0.1'
PATH_INFO: '/api/3/action/datastore_create'
PATH_TRANSLATED: '/etc/ckan/default/opendatabulgaria.wsgi/api/3/action/datastore_create'
REMOTE_ADDR: '10.255.0.1'
REMOTE_PORT: '58505'
REQUEST_METHOD: 'POST'
REQUEST_URI: '/api/3/action/datastore_create'
SCRIPT_FILENAME: '/etc/ckan/default/opendatabulgaria.wsgi'
SERVER_ADDR: '127.0.0.1'
SERVER_ADMIN: '[no address given]'
SERVER_NAME: 'opendata.obshtestvo.bg'
SERVER_PORT: '80'
SERVER_PROTOCOL: 'HTTP/1.0'
SERVER_SIGNATURE: '<address>Apache/2.2.22 (Debian) Server at opendata.obshtestvo.bg Port 80</address>\n'
SERVER_SOFTWARE: 'Apache/2.2.22 (Debian)'
WSGI Variables
--------------
application: <fanstatic.publisher.Delegator object at 0x7ff7d6c59290>
beaker.cache: <beaker.cache.CacheManager object at 0x7ff7d6c59310>
beaker.get_session: <bound method SessionMiddleware._get_session of <beaker.middleware.SessionMiddleware object at 0x7ff7d6507990>>
beaker.session: {'_accessed_time': 1454008286.581036, '_creation_time': 1454008286.581036}
fanstatic.needed: <fanstatic.core.NeededResources object at 0x7ff7d88f99d0>
mod_wsgi.application_group: 'opendata.obshtestvo.bg|'
mod_wsgi.callable_object: 'application'
mod_wsgi.handler_script: ''
mod_wsgi.input_chunked: '0'
mod_wsgi.listener_host: ''
mod_wsgi.listener_port: '8080'
mod_wsgi.process_group: 'opendatabulgaria'
mod_wsgi.request_handler: 'wsgi-script'
mod_wsgi.script_reloading: '1'
mod_wsgi.version: (3, 3)
paste.cookies: (<SimpleCookie: >, '')
paste.registry: <paste.registry.Registry object at 0x7ff7d88e46d0>
paste.throw_errors: True
pylons.action_method: <bound method ApiController.action of <ckan.controllers.api.ApiController object at 0x7ff7d899f850>>
pylons.controller: <ckan.controllers.api.ApiController object at 0x7ff7d899f850>
pylons.environ_config: {'session': 'beaker.session', 'cache': 'beaker.cache'}
pylons.pylons: <pylons.util.PylonsContext object at 0x7ff7d88ef250>
pylons.routes_dict: {'action': u'action', 'controller': u'api', 'ver': 3, 'logic_function': u'datastore_create'}
pylons.status_code_redirect: True
repoze.who.api: <repoze.who.api.API object at 0x7ff7d88e4d90>
repoze.who.logger: <logging.Logger object at 0x7ff7d6507610>
repoze.who.plugins: {'ckan.lib.authenticator:UsernamePasswordAuthenticator': <ckan.lib.authenticator.UsernamePasswordAuthenticator object at 0x7ff7d6c59850>, 'friendlyform': <FriendlyFormPlugin 140702429166864>, 'auth_tkt': <CkanAuthTktCookiePlugin 140702429166416>}
routes.route: <routes.route.Route object at 0x7ff7d695bb10>
routes.url: <routes.util.URLGenerator object at 0x7ff7d88ef3d0>
webob._parsed_query_vars: (GET([]), '')
webob.adhoc_attrs: {'response': <Response at 0x7ff7d88ef050 200 OK>, 'language': 'en-us'}
webob.is_body_seekable: True
wsgi process: 'Multi process AND threads (?)'
wsgi.file_wrapper: <built-in method file_wrapper of mod_wsgi.Adapter object at 0x7ff7d88e6468>
wsgi.version: (1, 1)
wsgiorg.routing_args: (<routes.util.URLGenerator object at 0x7ff7d88ef3d0>, {'action': u'action', 'controller': u'api', 'ver': 3, 'logic_function': u'datastore_create'})
Having the same issue. CSV file is in UTF-8, so it shouldn't be an issue. I suspect the issue is that headers are partially ascii, partially utf-8. Need to check
For Python 2, I guess you have to fix str(header) to header.encode('utf-8').
- headers = [str(header) for header in headers]
+ headers = [header.encode('utf-8') for header in headers]