http-extensions
http-extensions copied to clipboard
Resumable Uploads: Describe interaction with Digest Fields
This PR adds a section describing a possible interaction between resumable uploads and digest fields. I am not yet entirely sure whether this style is appropriate in such a draft at all. The intention is to help readers understand how digests could help in certain use cases and how implementation should handle interaction between resumable uploads and digests.
Closes #2408
The digest spec is very loose with any rules for what to do in failure and punts that up to other specs.
The text here seems technically correct but raises a few questions, mainly to do with reacting to failure. For example, if an append fails an integrity check on the server side, how does it signal back to the client and what are the expectations on the client to recover?
The PR currently opens more questions than it answers, which might lead to hard to troubleshoot interop problems. So a bit more thought and discussion is needed. For example, can we survey how s3, Google drive etc expect integrity fsilures to be handled?
For example, if an append fails an integrity check on the server side, how does it signal back to the client and what are the expectations on the client to recover?
I agree, these points should be mentioned. The draft could recommend that servers respond with a 400 status code. As far as I know, there is no status code specifically to indicate failure with integrity check and RFC 9530 does not make any recommendation (which might be on purpose, I assume). On the other hand, a 400 makes it challenging for the client to discover that the underlying cause was a failed integrity check.
I am unsure what the best reaction of the client would be if it discovers a failed integrity check. It could try to retransmit the upload part in the hope that the integrity failure was caused by a fault in the transmission (e.g. bit flip). Although it is questionable how often this occurs in practice. In most situation, the client would bubble up the error to the end as a integrity check failure often points to bugs in the implementation in my experience. Would you have any other recommendation?
For example, can we survey how s3, Google drive etc expect integrity fsilures to be handled?
I looked into S3 and the client can supply a checksum in the request headers or trailers (see https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html). If the request content's checksum calculated by the server does match the provided value, it will reject the request with a 400 response code (see BadDigest at https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html)
For example, if an append fails an integrity check on the server side, how does it signal back to the client and what are the expectations on the client to recover?
I agree, these points should be mentioned. The draft could recommend that servers respond with a 400 status code. As far as I know, there is no status code specifically to indicate failure with integrity check and RFC 9530 does not make any recommendation (which might be on purpose, I assume). On the other hand, a 400 makes it challenging for the client to discover that the underlying cause was a failed integrity check.
We could use problem details in the response; see https://www.rfc-editor.org/rfc/rfc9457.html. This actually seems aligned with how S3 responds with BadDigest in your example, except it would be a standard approach that cite. If going that route, we might want to consider defining problem type extensions specifi to digest, such as returning the failed calculated digest value.
I am unsure what the best reaction of the client would be if it discovers a failed integrity check. It could try to retransmit the upload part in the hope that the integrity failure was caused by a fault in the transmission (e.g. bit flip). Although it is questionable how often this occurs in practice. In most situation, the client would bubble up the error to the end as a integrity check failure often points to bugs in the implementation in my experience. Would you have any other recommendation?
If the client is speaking "directly" to an upload server, I suspect a content-digest for an append has low risk of failing due to transport protocol integrity (e.g. TLS or QUIC). However, any cases of intermediation increase the number of processing steps and transport hops, and so the risks increase. Allowing an upload server sat behind N levels of intermediation to detect some form of corruption and signal that back seems of potential value to me.
Regarding bubbling up of errors: the upload append section currently states:
The client SHOULD NOT automatically retry if a client error status code between 400 and 499 (inclusive) is received.
It's an interesting thought if a user would want to be pestered with errors of dialog boxes to retry a failed upload append due to a digest failure, or whether their user agent could attempt like 3 retries automatically before giving up and flagging a problem.
We could use problem details in the response; see https://www.rfc-editor.org/rfc/rfc9457.html. This actually seems aligned with how S3 responds with BadDigest in your example, except it would be a standard approach that cite. If going that route, we might want to consider defining problem type extensions specifi to digest, such as returning the failed calculated digest value.
Problem details would be a great fit here and be better suited than a status code to indicate mismatching digests. I am interested in exploring possible problem types for digest fields in general, which can also be used for use cases like resumable uploads.
I could also imagine that we define specific problem types for resumable uploads, for example to indicate details about mismatching offsets.
Regarding bubbling up of errors: the upload append section currently states:
The client SHOULD NOT automatically retry if a client error status code between 400 and 499 (inclusive) is received.
It's an interesting thought if a user would want to be pestered with errors of dialog boxes to retry a failed upload append due to a digest failure, or whether their user agent could attempt like 3 retries automatically before giving up and flagging a problem.
The recommendation to not retry on 4xx was mainly motivated by 404 or 403 responses. In these cases, retrying without modifying the request will likely not help because the upload resource doesn't exist at all or is not accessible by the end user. However, there are also some 4xx status codes where a retry of the same request could be beneficial, such as 408 Request Timeout, 423 Locked, 429 Too Many Requests. This was also an experience from a few tus implementations where we arrived at a situation where the client only retries on 423 and 429, while also allowing the end user to override this behavior. I think we should also update the text in this draft to include these considerations.