webmachine content-md5 checking bugs/architecture

content-md5 checking bugs/architecture

Open reiddraper opened this issue 11 years ago • 1 comments

I've run into a couple issues with the Content-MD5 checking in Webmachine. This issue is to discuss the steps for fixing, as there are some subtleties.

The current code assumes hex-encoding instead of base64. This is an easy fix.
The current code will read (up to 50MB) of the body as a fully buffered in memory binary if it hasn't been treated as a stream yet. Bodies over 50MB are hardcoded to return HTTP 413. If the body hasn't been read at all (which is quite likely), this also means that if you later try and treat it as a stream (say in accept_body), Webmachine exits here.
For bodies that will always be streamed by the end-resource callback, the md5 should be calculated as the user consumes the stream. Otherwise, the stream will either be buffered in memory (bad), or already consumed by the time the user wants at the stream. One fix would be to calculate the md5 as the user consumes the stream. This has the downside that the user wouldn't be notified of content-md5 errors until after they've processed the stream, but I'm not sure we have much choice. My first thought is to add another callback such was validated_content_checksum/3 (note the past tense) that will be called, before HTTP 20* would be returned, but after the stream had been processed. The third argument would be a boolean() of whether the content-md5 was correct or not. If it's false, this would give the resource author the chance to 'back out' anything they had done with the resource as it was streamed in. In some ways, this supersedes issue 2

Thoughts?

cc/ @seancribbs @vinoski @kellymclaughlin

Jan 29 '13 20:01 reiddraper

webmachine webmachine copied to clipboard

content-md5 checking bugs/architecture

webmachine
webmachine copied to clipboard