django-debug-toolbar icon indicating copy to clipboard operation
django-debug-toolbar copied to clipboard

UnicodeEncodeError in middleware.py

Open carstenfuchs opened this issue 3 years ago • 11 comments

Hello,

using Django Debug Toolbar 3.2.4 with Django 4.0.3, I get the following stack trace:

Traceback (most recent call last):
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/debug_toolbar/middleware.py", line 93, in __call__
    response.content = insert_before.join(bits)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/template/response.py", line 143, in content
    HttpResponse.content.fset(self, value)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 387, in content
    content = self.make_bytes(value)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 296, in make_bytes
    return bytes(value.encode(self.charset))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 181020-181021: surrogates not allowed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/wsgi.py", line 132, in __call__
    response = self.get_response(request)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/base.py", line 140, in get_response
    response = self._middleware_chain(request)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 57, in inner
    response = response_for_exception(request, exc)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 139, in response_for_exception
    response = handle_uncaught_exception(
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 180, in handle_uncaught_exception
    return debug.technical_500_response(request, *exc_info)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/views/debug.py", line 67, in technical_500_response
    return HttpResponse(html, status=status_code, content_type="text/html")
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 355, in __init__
    self.content = content
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 387, in content
    content = self.make_bytes(value)
  File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 296, in make_bytes
    return bytes(value.encode(self.charset))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 5995-5996: surrogates not allowed

The related section around line 93 in debug_toolbar/middleware.py is:

        # Always render the toolbar for the history panel, even if it is not
        # included in the response.
        rendered = toolbar.render_toolbar()

        # …

        # Insert the toolbar in the response.
        content = response.content.decode(response.charset)
        insert_before = dt_settings.get_config()["INSERT_BEFORE"]
        pattern = re.escape(insert_before)
        bits = re.split(pattern, content, flags=re.IGNORECASE)
        if len(bits) > 1:
            bits[-2] += rendered
            response.content = insert_before.join(bits)
            if "Content-Length" in response:
                response["Content-Length"] = len(response.content)
        return response

If I replace line

            bits[-2] += rendered

with

            bits[-2] += rendered.encode('ascii', 'replace').decode()

in order to get rid of any problematic characters, it works.

Unfortunately, I've no idea what might cause this and I'm not sure how to proceed from here?

carstenfuchs avatar Mar 29 '22 15:03 carstenfuchs

Do you know which characters are being rendered and in which panels they are coming from?

tim-schilling avatar Mar 29 '22 15:03 tim-schilling

I modified the above to replace the german Umlaute (äöüÄÖÜß) with something safe. Not elegant at all, but:

            bits[-2] += rendered.replace('ä', 'XXX').replace('Ä', 'XXX').replace('ö', 'XXX').replace('Ö', 'XXX').replace('ü', 'XXX').replace('Ü', 'XXX').replace('ß', 'XXX').encode('ascii', 'backslashreplace').decode()

Replacing the Umlaute alone was not enough, the encode-decode-step is still necessary. With backslashreplace as the method for the remaining Unicode characters, this yields the attached screenshot. Note the \xbb near the top: grafik However, I'm not sure if this is actually the culprit – the unicode characters that cause the trouble might still be elsewhere.

Grepping the page HTML source for occurrences of \x, I found these fragments:

  • <a id="djHideToolBarButton" href="#" title="Toolbar ausblenden">Ausblenden \xbb</a></li>
  • <button type="button" class="djDebugClose">\xd7</button>
  • (&#x27;nb&#x27;, &#x27;Norwegian Bokm\xe5l&#x27;),

Maybe it's on of these? If I can figure out what the original Unicode characters are for these, I can try and replace them as well.

carstenfuchs avatar Mar 29 '22 19:03 carstenfuchs

I made some progress and eventually managed to find the surrogates that are mentioned in the stack trace:

In one of my apps, I have static files with german umlaut characters, e.g. Handbuch/images/kalendereinträge.png

These files are listed in the "Static files" panel in section django.contrib.staticfiles.finders.AppDirectoriesFinder, where they cause the reported error.

My current work-around is to replace only the surrogate characters:

            rendered = rendered.replace('\udcc3\udcbc', '___???___')
            rendered = rendered.replace('\udcc3\udca4', '___???___')
            bits[-2] += rendered

However, I still have no idea how the surrogate characters come up in the first place: My system is Ubuntu 20.04 LTS and there is nothing special about the above mentioned files at all.

Can anyone reproduce this?

carstenfuchs avatar Mar 31 '22 18:03 carstenfuchs

Is it possible that your filesystem encoding (in Python) isn't set to UTF-8? My systemd user units always contain the following environment variables: Environment=LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_CTYPE=en_US.UTF-8

I remember that we had many problems in the past without this; the server process would basically crash each time someone uploaded files containing umlauts. I'm not 100% sure if this happened only with Python 2 or also with Python 3 though so this could be a dead end.

matthiask avatar Mar 31 '22 19:03 matthiask

... that being said, maybe the toolbar should expect filenames which cannot be properly converted to UTF-8...

matthiask avatar Mar 31 '22 19:03 matthiask

@matthiask thanks for your hints! You're right, my filesystem encoding in Python was indeed set to ascii and I could resolve the problem by setting LANG=de_DE.UTF-8 in the Apache /etc/apache2/envvars config file. (I used to use LANG=C for over 10 years, so now I'm mildly worried that the change might introduce subtle side effects elsewhere. Maybe I'll still switch to en_US.UTF-8, after all.)

Although the problem was eventually caused by my Apache config alone, this seems to be a very complicated topic. Maybe it would be possible for djdt to warn about non-UTF-8 filesystem encodings? (Or better about filenames that cannot be properly decoded?)

Thank you!

carstenfuchs avatar Apr 01 '22 14:04 carstenfuchs

Oh yeah, it's complicated and very annoying.

I'm unsure what we should do. On one hand django-debug-toolbar shouldn't crash, on the other hand it's documented that Django expects an UTF-8 environment (not C) here https://docs.djangoproject.com/en/4.0/ref/unicode/#files So maybe the somewhat strange behavior is to be expected?

I didn't even know that this was documented, this section has been added recently (in 2015 😅)

matthiask avatar Apr 01 '22 14:04 matthiask

Thanks for the link! My project started in 2011 and even though I regularly work through all Release Notes very carefully, it is easy to miss such updates in the docs, useful and worthwhile as they are.

Imho, it would be ideal if this could be covered with a Django system check, which however is in vain here, given that command line shells and webservers tend to have different environments.

carstenfuchs avatar Apr 01 '22 15:04 carstenfuchs

We already discussed adding checks for issues which are (arguably) only surfaced but not really caused by django-debug-toolbar in the past; the last time it was about static files as well https://github.com/jazzband/django-debug-toolbar/issues/1318

Such things are really hard to debug sometimes if you don't already know where to look so I think it may be time to revisit my stance on this. I wrote that I am slightly against adding checks for other apps (even if those other apps are bundled with Django) but I'm not so sure anymore.

Here would probably be the place for such a new check: https://github.com/jazzband/django-debug-toolbar/blob/54e63f0494414ae0d93abc6e202d5f644c75952a/debug_toolbar/panels/staticfiles.py#L182-L203

matthiask avatar Apr 01 '22 15:04 matthiask

This is looking great! :-)

If I understand things correctly though, it might be possible that an error that raises an exception (such as here with the surrogates) dominates the checks, as it never gives them a chance to be displayed as part of the normal output.

carstenfuchs avatar Apr 01 '22 15:04 carstenfuchs

Oh, and this would be a check for the Django core, not for a Django app, not even a built-in.

carstenfuchs avatar Apr 01 '22 15:04 carstenfuchs

Since we haven't seen any thank yous or emojis in this thread, I'm closing this issue rather than implementing a check.

tim-schilling avatar Oct 23 '23 00:10 tim-schilling