django-sendfile2 icon indicating copy to clipboard operation
django-sendfile2 copied to clipboard

Non-ASCII filenames

Open bors-ltd opened this issue 3 years ago • 7 comments

Hi,

I ran into an issue when upgrading a project to Django 3.1 and bumping dependencies. But before sending a PR, I'd like to discuss the solution.

Serving files works fine, except in production when they contain non-ASCII characters. I printed the response generated to debug it::

Content-Type: text/plain
X-Sendfile: /example/Accentu\xe9.txt
Content-Disposition: inline; filename="Accentue.odt"; filename*=UTF-8''Accentu%C3%A9.txt
Content-length: 143543

The path is encoded in ISO-8859-1, which makes sense with HTTP, but the filesystem encoding is UTF-8.

So I tried to quote it using urllib.parse.quote::

X-Sendfile: /example/Accentu%C3%A9.txt

with no success. So let's just encode it ourselves adding os.fsencode(filename) in the xsendfile backend::

X-Sendfile: /example/Accentu\xc3\xa9.txt

and I could finally download the file.

I'm not sure why it used to work in Django 2.2 and/or older versions of sendfile2, I can't rollback to test and I don't really want to spend time on it.

But you may not consider that fsencode an elegant solution. I can see in other issues you want to support the case where the file is served by Apache from another server. It probably means a setting, with a fallback to the local filesystem encoding. Tell me and I'll tailor the PR.

If you reject that change altogether, I can just "fork" the xsendfile backend in my project and change the setting to point to it.

The nginx backend seems safe, as it quotes the path.

bors-ltd avatar Sep 07 '20 08:09 bors-ltd

I'd suggest you first write a test for unicode filenames and open a draft PR that just contains that. We will then see if Django 3.1 did really break something or not and go from there.

moggers87 avatar Sep 08 '20 10:09 moggers87

There are tests already covering unicode filenames. My issue may be more in Apache/X-Sendfile, but that won't be covered in a unit test.

My findings so far:

  • mod_xsendfile is unmaintained for like 10 years, good thing it still compiles against 2.4.
  • HTTP response headers might not actually accept non-ASCII characters, so my solution is just a hack, and nginx is right in urlquoting the path.
  • I found a fork adding that quoting https://github.com/nmaier/mod_xsendfile but unmaintained as well, and I use a shared hosting anyway.

So all in all, there is no "good" answer, no specification or documentation I can rely on to assert Django is producing the expected response header. mod_xsendfile never made it to the Apache official modules, that's surprising for such a useful module with no alternative after all this time.

bors-ltd avatar Sep 08 '20 15:09 bors-ltd

After a refactor I could actually serve the files directly from a memoryview, a new feature of Django 3, and I didn't need sendfile for this project anymore.

Feel free to close this issue as you see fit, but I hope that little summary of the mess X-Sendfile is was useful for your, other users, and to answer future issues.

bors-ltd avatar Sep 24 '20 13:09 bors-ltd

What is the current state of this issue?

With a simple method-based view, I can still reproduce this issue (currently using version 0.5.1):

from django_sendfile import sendfile

def sendfile_test(request):
    return sendfile(request, '/absolute/path/to/äüö.pdf')

This will yield the following entry inside the Apache error log:

[Tue Jun 28 15:24:39.895816 2022] [:error] [pid 2988] (2)No such file or directory: [client 192.168.XXX.XX:57604] xsendfile: cannot open file: /absolute/path/to/\xe4\xfc\xf6.pdf

The runserver setup works fine, but this does not help in production environments.

Replacing/monkey-patching the backend will yield the correct result. The patch includes encoding the path before passing it to the header inside the xsendfile backend:

filename = filename.encode('utf-8')

Looking at response._headers, the current implementation yields /absolute/path/to/äüö.pdf for the X-Sendfile header, whereas /absolute/path/to/äüö.pdf is given by the encoded version (and appears to work correctly).

stefan6419846 avatar Jun 28 '22 13:06 stefan6419846

mod_xsendfile is unmaintained for like 10 years, good thing it still compiles against 2.4.

@stefan6419846 that's the current status. I will add a warning to the docs before closing this issue, but it's well beyond the scope of this project to start maintaining an Apache module.

moggers87 avatar Jul 07 '22 14:07 moggers87

Does this indicate that this should rather be reported against mod_xsendfile as this is an issue there?

While mod_xsendfile might not be actively maintained, it seems to still work without any real issues (at least given my requirements). For this reason, from my perspective it should not hurt to actually include the aforementioned patch - unless there are some other cases which will break and which I am not aware of. Unfortunately, for some user bases non-ASCII filenames are quite common, which includes Germany as in my case, rendering the default xsendfile backend useless.

stefan6419846 avatar Jul 07 '22 15:07 stefan6419846

Just stumbled over this thread while I was looking if there is any update for the xsendfile backend.

filename = filename.encode('utf-8')

We are using that workaround in production since 2020 and had no issues since then.

tidenhub avatar Jul 26 '23 13:07 tidenhub