http4s-blaze-client: File upload (Multipart) fails if server doesn't support "Transfer-Encoding: chunked"
Issue in http4s-blaze-client
Summary
A POST request with Multipart body for uploading a file to GitHub Release using GitHub API fails with the following error message.
{
"message": "Bad Content-Length: ",
"request_id": "SOME_ID_HERE"
}
Cause
It looks like GitHub API doesn't support Transfer-Encoding: chunked header.
It's confirmed by the following curl command.
curl -v -i -F upload=@"$(pwd)/path/to/some.jar" \
-H "Authorization: token ${github_api_token}" \
-H "Transfer-Encoding: chunked" \
https://uploads.github.com/repos/USER/REPO/releases/RELEASE_ID/assets?name=some.jar
# fails with "message":"Bad Content-Length: "
If I remove -H "Transfer-Encoding: chunked from the header, it succeeds. curl sets the Content-Lengthwhen Transfer-Encoding: chunked is not set.
Detailed Description
When I do it using http4s-blaze-client, it fails since Multipart.headers has Transfer-Encoding: chunked.
If I remove Transfer-Encoding: chunked from the Multipart.headers before adding it to Request, it can successfully upload a file when the file size is small. I don't know how small is small (maybe default buffer size?), but the file with about 719k (719,018 bytes) was OK while the one with about 2.3M (2,301,910 bytes) was not.
If I remove the Transfer-Encoding: chunked header and set the Content-Length manually, it can successfully upload the file.
I don't think it matters but these are the Scala and http4s versions. Scala version: 2.12.12 http4s version: 0.21.16
My guess for GitHub API not supporting Transfer-Encoding is probably because of this?
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding
Since HTTP 2 doesn't support it and the current GitHub API is designed for HTTP 2 so it has probably dropped any HTTP 1/1.1 specific things?
Is there any way to make http4s-blaze-client automatically set Content-Length?
Setting the Content-Length manually might be problematic because the length is not just about the file. The length should be the length of the entire request content so it is bigger than the file size.
Entity Encoders will set a Content-Length header when they know the content length in advance. Because Multipart is full of parts with streams, and because we don't know the length of a stream before we run it, Multipart doesn't. And that's why we fall back to chunked encoding.
Workarounds:
- If you know the exact length of your encoded body (including boundaries and such!) you can set your own
Content-Lengthheader after you callwithEntity. (I can't remember if we explicitly set chunked encoding, or wait for the backend to do it. You might need to strip that, too.) - If you don't know the exact length of your body, and don't care about streaming, compile your request body to a
ByteChunk, and then.withEntity(body = Stream.chunk(theCompiledChunk))and set the header yourself. (That might reset yourContent-Type. Check that, too. We can add API to make this more convenient if this is the way.) - If every
Partin yourMultiparthas a defined contentLength header, we could plausibly add them up (plus other prat headers plus boundaries) and set aContent-Lengthautomatically through the entity encoder. This would have to be an enhancement in the library.
All three of those are bad answers and we should make this better.
@rossabaker Thanks for your answer.
I think it should be able to figure out the size when the stream source of the Multipart if the source is a file as we can get the size of the file easily (file.length()).
I think I can probably try the second option which is compiling to ByteChunk for now.
Regarding a proper solution, what about adding an optional length info to org.http4s.multipart.Part?
e.g.)
final case class Part[F[_]](headers: Headers, body: Stream[F, Byte], length: Option[Long]) extends Media[F]
When a Part is created, it can be set if it's a File. However, I'm not sure if this would helpful to solve the problem. I can check if the Multipart or MultipartEncoder can use it to set the its headers. If you think it's worth trying, I can do it and create a PR once I confirm that it's working. At this point, I'm not sure if it's a good idea since I don't know the reasons behind the design decisions.
Sorry, never mind what I said about Part. It doesn't look so helpful.
Using Chunk[Byte] as a Request body with removal of Transfer-Encoding: chunked works. Thank you!
I'll use it for now and will also think about any better ways.