node-crate icon indicating copy to clipboard operation
node-crate copied to clipboard

Stream based blob API

Open martinheidegger opened this issue 9 years ago • 4 comments

The current blob API is not stream based. It would be good if it could be similar to fs.createWriteStream or fs.createReadStream


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

martinheidegger avatar Sep 01 '15 14:09 martinheidegger

I think the problem is that the sha1 checksum needs to be calculated in advance. See: https://crate.io/docs/stable/blob.html "To upload a blob the sha1 hash of the blob has to be known upfront since this has to be used as the id of the new blob"

megastef avatar Sep 02 '15 10:09 megastef

I had a talk with Jodok yesterday & it is indeed the case as far as I understood. think it would still be possible to implement it using a file-system step in-between.

function createWriteStream() {
   var shasum = crypto.createHash('sha1')
   var input = new stream.Duplex();
   var fsStream = fs.createWriteStream('tmpfile')
   input.on('data', function (data) {
     shasum.update(data);
   });
   input.pipe(fsStream);
   fsStream.on('end', function () {
     fs.createReadStream('tmpfile').pipe(createHttpBlobRequest(shasum.digest('hex')))
   })
   return input;
}

This would still be better than trying to implement it by the user themselves.

martinheidegger avatar Sep 02 '15 15:09 martinheidegger

+0.5 it solves the problem to upload files larger than heap limit and it could reduce memory ussage. Your example looks easy, but does not deal with with problems raised by using temporary files for 'streams'.

  1. Management of temporary files (naming, location, deletion in various error scenarios ...)
  2. Delay in upload. I gues streaming makes only sense for large files or realtime data over networks (like IP cams) - in case of live video the upload starts eventually after a long time (at end of stream).
  3. 'Endless streams' like video from an IP Camera or continious packet captures could fill up the disk without sending any byte to Crate! So the implementation might not meet the expectation of API users -> more documentation about it, adding limits for file size and timeouts for streams that don't provide data for N Minutes (to close the temporary file ...).

Well, we could start with a simple version - but I would recommend to Crate to accept streams without sha hash in the URL (or allow simply user defined ID's). Crate could calculate the hash during reception to return it in the http response. The client driver could calculate the sha hash during upload to verify it with the server value.

Is there an issue open @crate to support streaming of blobs (without sha upfront)?

megastef avatar Sep 02 '15 21:09 megastef

testing the graphql

sairamdevarashetty avatar Apr 25 '17 13:04 sairamdevarashetty