tika-python icon indicating copy to clipboard operation
tika-python copied to clipboard

Incorrect filename in Content-Disposition header

Open tongwang opened this issue 3 years ago • 1 comments

Fixes of #167, #124, #225 and #285 only mask the error, but never generate the correct Content-Disposition header.

With those fixes: when rfc6266 is installed, we get TypeError as reported in #274. when rfc6266 is not installed, we get incorrect filename in Content-Disposition header. For example, if the filename is hello.c, instead of Content-Disposition: attachment; filename=hello.c, we get Content-Disposition: attachment; filename=b'hello.c'. This may explain #333.

With incorrect filenames, Tika's content detection may return different file types. Use the same hello.c as an example, with Content-Disposition: attachment; filename=hello.c, Tika content detection returns text/x-csrc, while with Content-Disposition: attachment; filename=b'hello.c', Tika returns text/plain, because Tika thinks the file name is b'hello.c'.

tongwang avatar Nov 18 '21 19:11 tongwang

Interesting. Please propose a patch to fix this if you have time. Thanks @tongwang I will take a look if you submit a PR and in the next release.

chrismattmann avatar Dec 31 '22 21:12 chrismattmann