torako
torako copied to clipboard
Saving PDF files consistently fails.
When torako encounters a post with a pdf attachment, it will always fail to download that attachment. Version(s) known to be affected: "0.11.2" Boards known to occur on: /po/, /tg/
To reproduce: Run torako targeting /po/ with media downloading enabled, much of the media on /po/ are pdf files.
Example log output:
2022-03-15T05:52:33.883Z ERROR torako::storage::asagi > Downloading media failed: The image had an invalid content length: 200 OK: https://i.4cdn.org/tg/1646105029717.pdf
Yeah I've known about this for a while; I assume you are using an s3 storage backend. The problem occurs because the s3/backblaze api requires the content length before you start uploading, and in s3 mode, torako does not save the image to disk. This means the content length isn't computed.
I know the Java S3 library is able to compute the content length on the fly by buffering internally, I'll check if rusoto_s3
can do the same, which could fix the issue for s3 storage endpoints.
Just fyi using the Content-Length of the response from 4chan is wrong overall, that header represents the encoded length of the response which would break things if 4chan ever decides to start applying any encoding to images (they currently don't so it happens to work)