webmachine
webmachine copied to clipboard
content-md5 checking bugs/architecture
I've run into a couple issues with the Content-MD5
checking in Webmachine.
This issue is to discuss the steps for fixing, as there are some subtleties.
- The current code assumes hex-encoding instead of base64. This is an easy fix.
- The current code
will read (up to 50MB) of the body as a fully buffered in
memory binary if it hasn't been treated as a stream yet. Bodies over
50MB are hardcoded to return HTTP 413. If the body hasn't been
read at all (which is quite likely), this also means that if you later
try and treat it as a stream (say in
accept_body
), Webmachine exits here. - For bodies that will always be streamed by the end-resource callback,
the md5 should be calculated as the user consumes the stream. Otherwise,
the stream will either be buffered in memory (bad), or already consumed
by the time the user wants at the stream. One fix would be to calculate
the md5 as the user consumes the stream. This has the downside that
the user wouldn't be notified of content-md5 errors until after
they've processed the stream, but I'm not sure we have much choice.
My first thought is to add another callback such was
validated_content_checksum/3
(note the past tense) that will be called, before HTTP 20* would be returned, but after the stream had been processed. The third argument would be aboolean()
of whether the content-md5 was correct or not. If it'sfalse
, this would give the resource author the chance to 'back out' anything they had done with the resource as it was streamed in. In some ways, this supersedes issue 2
Thoughts?
cc/ @seancribbs @vinoski @kellymclaughlin
See also this PR.