django-debug-toolbar
django-debug-toolbar copied to clipboard
UnicodeEncodeError in middleware.py
Hello,
using Django Debug Toolbar 3.2.4 with Django 4.0.3, I get the following stack trace:
Traceback (most recent call last):
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner
response = get_response(request)
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/debug_toolbar/middleware.py", line 93, in __call__
response.content = insert_before.join(bits)
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/template/response.py", line 143, in content
HttpResponse.content.fset(self, value)
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 387, in content
content = self.make_bytes(value)
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 296, in make_bytes
return bytes(value.encode(self.charset))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 181020-181021: surrogates not allowed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/wsgi.py", line 132, in __call__
response = self.get_response(request)
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/base.py", line 140, in get_response
response = self._middleware_chain(request)
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 57, in inner
response = response_for_exception(request, exc)
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 139, in response_for_exception
response = handle_uncaught_exception(
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/core/handlers/exception.py", line 180, in handle_uncaught_exception
return debug.technical_500_response(request, *exc_info)
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/views/debug.py", line 67, in technical_500_response
return HttpResponse(html, status=status_code, content_type="text/html")
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 355, in __init__
self.content = content
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 387, in content
content = self.make_bytes(value)
File "/home/carsten/.virtualenvs/Lori/lib/python3.8/site-packages/django/http/response.py", line 296, in make_bytes
return bytes(value.encode(self.charset))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 5995-5996: surrogates not allowed
The related section around line 93 in debug_toolbar/middleware.py is:
# Always render the toolbar for the history panel, even if it is not
# included in the response.
rendered = toolbar.render_toolbar()
# …
# Insert the toolbar in the response.
content = response.content.decode(response.charset)
insert_before = dt_settings.get_config()["INSERT_BEFORE"]
pattern = re.escape(insert_before)
bits = re.split(pattern, content, flags=re.IGNORECASE)
if len(bits) > 1:
bits[-2] += rendered
response.content = insert_before.join(bits)
if "Content-Length" in response:
response["Content-Length"] = len(response.content)
return response
If I replace line
bits[-2] += rendered
with
bits[-2] += rendered.encode('ascii', 'replace').decode()
in order to get rid of any problematic characters, it works.
Unfortunately, I've no idea what might cause this and I'm not sure how to proceed from here?
Do you know which characters are being rendered and in which panels they are coming from?
I modified the above to replace the german Umlaute (äöüÄÖÜß) with something safe. Not elegant at all, but:
bits[-2] += rendered.replace('ä', 'XXX').replace('Ä', 'XXX').replace('ö', 'XXX').replace('Ö', 'XXX').replace('ü', 'XXX').replace('Ü', 'XXX').replace('ß', 'XXX').encode('ascii', 'backslashreplace').decode()
Replacing the Umlaute alone was not enough, the encode-decode-step is still necessary. With backslashreplace as the method for the remaining Unicode characters, this yields the attached screenshot. Note the \xbb near the top:
However, I'm not sure if this is actually the culprit – the unicode characters that cause the trouble might still be elsewhere.
Grepping the page HTML source for occurrences of \x, I found these fragments:
<a id="djHideToolBarButton" href="#" title="Toolbar ausblenden">Ausblenden \xbb</a></li><button type="button" class="djDebugClose">\xd7</button>('nb', 'Norwegian Bokm\xe5l'),
Maybe it's on of these? If I can figure out what the original Unicode characters are for these, I can try and replace them as well.
I made some progress and eventually managed to find the surrogates that are mentioned in the stack trace:
In one of my apps, I have static files with german umlaut characters, e.g.
Handbuch/images/kalendereinträge.png
These files are listed in the "Static files" panel in section django.contrib.staticfiles.finders.AppDirectoriesFinder, where they cause the reported error.
My current work-around is to replace only the surrogate characters:
rendered = rendered.replace('\udcc3\udcbc', '___???___')
rendered = rendered.replace('\udcc3\udca4', '___???___')
bits[-2] += rendered
However, I still have no idea how the surrogate characters come up in the first place: My system is Ubuntu 20.04 LTS and there is nothing special about the above mentioned files at all.
Can anyone reproduce this?
Is it possible that your filesystem encoding (in Python) isn't set to UTF-8? My systemd user units always contain the following environment variables: Environment=LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LC_CTYPE=en_US.UTF-8
I remember that we had many problems in the past without this; the server process would basically crash each time someone uploaded files containing umlauts. I'm not 100% sure if this happened only with Python 2 or also with Python 3 though so this could be a dead end.
... that being said, maybe the toolbar should expect filenames which cannot be properly converted to UTF-8...
@matthiask thanks for your hints! You're right, my filesystem encoding in Python was indeed set to ascii and I could resolve the problem by setting LANG=de_DE.UTF-8 in the Apache /etc/apache2/envvars config file. (I used to use LANG=C for over 10 years, so now I'm mildly worried that the change might introduce subtle side effects elsewhere. Maybe I'll still switch to en_US.UTF-8, after all.)
Although the problem was eventually caused by my Apache config alone, this seems to be a very complicated topic. Maybe it would be possible for djdt to warn about non-UTF-8 filesystem encodings? (Or better about filenames that cannot be properly decoded?)
Thank you!
Oh yeah, it's complicated and very annoying.
I'm unsure what we should do. On one hand django-debug-toolbar shouldn't crash, on the other hand it's documented that Django expects an UTF-8 environment (not C) here https://docs.djangoproject.com/en/4.0/ref/unicode/#files So maybe the somewhat strange behavior is to be expected?
I didn't even know that this was documented, this section has been added recently (in 2015 😅)
Thanks for the link! My project started in 2011 and even though I regularly work through all Release Notes very carefully, it is easy to miss such updates in the docs, useful and worthwhile as they are.
Imho, it would be ideal if this could be covered with a Django system check, which however is in vain here, given that command line shells and webservers tend to have different environments.
We already discussed adding checks for issues which are (arguably) only surfaced but not really caused by django-debug-toolbar in the past; the last time it was about static files as well https://github.com/jazzband/django-debug-toolbar/issues/1318
Such things are really hard to debug sometimes if you don't already know where to look so I think it may be time to revisit my stance on this. I wrote that I am slightly against adding checks for other apps (even if those other apps are bundled with Django) but I'm not so sure anymore.
Here would probably be the place for such a new check: https://github.com/jazzband/django-debug-toolbar/blob/54e63f0494414ae0d93abc6e202d5f644c75952a/debug_toolbar/panels/staticfiles.py#L182-L203
This is looking great! :-)
If I understand things correctly though, it might be possible that an error that raises an exception (such as here with the surrogates) dominates the checks, as it never gives them a chance to be displayed as part of the normal output.
Oh, and this would be a check for the Django core, not for a Django app, not even a built-in.
Since we haven't seen any thank yous or emojis in this thread, I'm closing this issue rather than implementing a check.