django-sendfile2
django-sendfile2 copied to clipboard
Non-ASCII filenames
Hi,
I ran into an issue when upgrading a project to Django 3.1 and bumping dependencies. But before sending a PR, I'd like to discuss the solution.
Serving files works fine, except in production when they contain non-ASCII characters. I printed the response generated to debug it::
Content-Type: text/plain
X-Sendfile: /example/Accentu\xe9.txt
Content-Disposition: inline; filename="Accentue.odt"; filename*=UTF-8''Accentu%C3%A9.txt
Content-length: 143543
The path is encoded in ISO-8859-1, which makes sense with HTTP, but the filesystem encoding is UTF-8.
So I tried to quote it using urllib.parse.quote::
X-Sendfile: /example/Accentu%C3%A9.txt
with no success. So let's just encode it ourselves adding os.fsencode(filename)
in the xsendfile backend::
X-Sendfile: /example/Accentu\xc3\xa9.txt
and I could finally download the file.
I'm not sure why it used to work in Django 2.2 and/or older versions of sendfile2, I can't rollback to test and I don't really want to spend time on it.
But you may not consider that fsencode
an elegant solution. I can see in other issues you want to support the case where the file is served by Apache from another server. It probably means a setting, with a fallback to the local filesystem encoding. Tell me and I'll tailor the PR.
If you reject that change altogether, I can just "fork" the xsendfile backend in my project and change the setting to point to it.
The nginx backend seems safe, as it quotes the path.
I'd suggest you first write a test for unicode filenames and open a draft PR that just contains that. We will then see if Django 3.1 did really break something or not and go from there.
There are tests already covering unicode filenames. My issue may be more in Apache/X-Sendfile, but that won't be covered in a unit test.
My findings so far:
- mod_xsendfile is unmaintained for like 10 years, good thing it still compiles against 2.4.
- HTTP response headers might not actually accept non-ASCII characters, so my solution is just a hack, and nginx is right in urlquoting the path.
- I found a fork adding that quoting https://github.com/nmaier/mod_xsendfile but unmaintained as well, and I use a shared hosting anyway.
So all in all, there is no "good" answer, no specification or documentation I can rely on to assert Django is producing the expected response header. mod_xsendfile never made it to the Apache official modules, that's surprising for such a useful module with no alternative after all this time.
After a refactor I could actually serve the files directly from a memoryview, a new feature of Django 3, and I didn't need sendfile for this project anymore.
Feel free to close this issue as you see fit, but I hope that little summary of the mess X-Sendfile is was useful for your, other users, and to answer future issues.
What is the current state of this issue?
With a simple method-based view, I can still reproduce this issue (currently using version 0.5.1):
from django_sendfile import sendfile
def sendfile_test(request):
return sendfile(request, '/absolute/path/to/äüö.pdf')
This will yield the following entry inside the Apache error log:
[Tue Jun 28 15:24:39.895816 2022] [:error] [pid 2988] (2)No such file or directory: [client 192.168.XXX.XX:57604] xsendfile: cannot open file: /absolute/path/to/\xe4\xfc\xf6.pdf
The runserver setup works fine, but this does not help in production environments.
Replacing/monkey-patching the backend will yield the correct result. The patch includes encoding the path before passing it to the header inside the xsendfile backend:
filename = filename.encode('utf-8')
Looking at response._headers
, the current implementation yields /absolute/path/to/äüö.pdf
for the X-Sendfile
header, whereas /absolute/path/to/äüö.pdf
is given by the encoded version (and appears to work correctly).
mod_xsendfile is unmaintained for like 10 years, good thing it still compiles against 2.4.
@stefan6419846 that's the current status. I will add a warning to the docs before closing this issue, but it's well beyond the scope of this project to start maintaining an Apache module.
Does this indicate that this should rather be reported against mod_xsendfile
as this is an issue there?
While mod_xsendfile
might not be actively maintained, it seems to still work without any real issues (at least given my requirements). For this reason, from my perspective it should not hurt to actually include the aforementioned patch - unless there are some other cases which will break and which I am not aware of. Unfortunately, for some user bases non-ASCII filenames are quite common, which includes Germany as in my case, rendering the default xsendfile
backend useless.
Just stumbled over this thread while I was looking if there is any update for the xsendfile backend.
filename = filename.encode('utf-8')
We are using that workaround in production since 2020 and had no issues since then.