robotframework-requests icon indicating copy to clipboard operation
robotframework-requests copied to clipboard

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2082' in position 289702: Body ('₂') is not valid Latin-1.

Open wlad opened this issue 4 years ago • 1 comments

I'm trying to POST an XML file which has elements like <items id="text">SpO₂</items>. Request fails with following error:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2082' in position 289702: Body ('₂') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

Traceback (most recent call last): File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/RequestsLibrary/utils.py", line 138, in decorator return func(*args, **kwargs) File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/RequestsLibrary/RequestsOnSessionKeywords.py", line 60, in post_on_session response = self._common_request("post", session, url, File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/RequestsLibrary/RequestsKeywords.py", line 37, in _common_request resp = method_function( File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/requests/sessions.py", line 590, in post return self.request('POST', url, data=data, json=json, **kwargs) File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/requests/sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/requests/sessions.py", line 655, in send r = adapter.send(request, **kwargs) File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/urllib3/connectionpool.py", line 394, in _make_request conn.request(method, url, **httplib_request_kw) File "/home/wlad/.venvs/ehrbase/lib/python3.9/site-packages/urllib3/connection.py", line 234, in request super(HTTPConnection, self).request(method, url, body=body, headers=headers) File "/usr/lib/python3.9/http/client.py", line 1257, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.9/http/client.py", line 1302, in _send_request body = _encode(body, 'body') File "/usr/lib/python3.9/http/client.py", line 164, in _encode raise UnicodeEncodeError(

Here is how I send the request (${file} is loaded via Get File keyword)

${resp}=            POST On Session      ${SUT}    /definition/template/adl1.4   expected_status=anything
                    ...                  data=${file}    headers=${headers}

If I remove from the payload the request succeeds

What am I missing? The XML files actually starts with

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

wlad avatar Nov 17 '21 19:11 wlad

Hi @wlad , I was able to reproduce the error.

The cause for this behavior is in python's http client.

Here, we have the following code:

def _encode(data, name='data'):
    """Call data.encode("latin-1") but show a better error message."""
    try:
        return data.encode("latin-1")
    except UnicodeEncodeError as err:
        raise UnicodeEncodeError(
            err.encoding,
            err.object,
            err.start,
            err.end,
            "%s (%.20r) is not valid Latin-1. Use %s.encode('utf-8') "
            "if you want to send it encoded in UTF-8." %
            (name.title(), data[err.start:err.end], name)) from None

the line return data.encode("latin-1") is the where the error occurs.

As you can see, it tries to decode the data as latin-1, disregarding <?xml version="1.0" encoding="utf-8" standalone="yes"?> in the xml file.

This issue has been raised in requests, too: https://github.com/psf/requests/issues/1822#issuecomment-30996036

There is a workaround. If you modify your test case like this, the requests should succeed:

${file}=    Get File    /path/to/file.xml    encoding=latin-1
${file_utf8}=    Evaluate    """${file}""".encode("utf-8")
${resp}=    POST On Session    ${SUT}    /definition/template/adl1.4    expected_status=anything
...    data=${file_utf8}    headers=${headers}

or have the file content encoded as latin-1

${file}=    Get File    /path/to/file.xml    encoding=latin-1
${resp}=    POST On Session    ${SUT}    /definition/template/adl1.4    expected_status=anything
...    data=${file}    headers=${headers}

robinmatz avatar Dec 30 '21 06:12 robinmatz