tika-python
tika-python copied to clipboard
Incorrect filename in Content-Disposition header
Fixes of #167, #124, #225 and #285 only mask the error, but never generate the correct Content-Disposition header.
With those fixes:
when rfc6266 is installed, we get TypeError as reported in #274.
when rfc6266 is not installed, we get incorrect filename in Content-Disposition header. For example, if the filename is hello.c, instead of Content-Disposition: attachment; filename=hello.c
, we get Content-Disposition: attachment; filename=b'hello.c'
. This may explain #333.
With incorrect filenames, Tika's content detection may return different file types. Use the same hello.c as an example, with Content-Disposition: attachment; filename=hello.c
, Tika content detection returns text/x-csrc
, while with Content-Disposition: attachment; filename=b'hello.c'
, Tika returns text/plain
, because Tika thinks the file name is b'hello.c'
.
Interesting. Please propose a patch to fix this if you have time. Thanks @tongwang I will take a look if you submit a PR and in the next release.