sharp Emit multiple data chunks from output Streams (was: Proper streaming)

Emit multiple data chunks from output Streams (was: Proper streaming)

Open LinusU opened this issue 9 years ago • 18 comments

It would be very cool if sharp could support proper streaming. As it is now, it will always emit exactly one data event which is the whole image. It kind of defeats the purposes of streams.

As I see it there is two ways to implement this.

The hard, but robust way.

Every time that libvips have data that should be written, it should call a function that we have defined. We then use this data to generate a data-event on the stream. It would be really cool if libvips could provide a state-machine that has a synchronous function that gives the next chunk. That way pause and unpause would work perfectly.
The quick and easy fix.

We can also create a named pipe and tell libvips to write it to that file via vips_image_write_to_file. To the user we would then return a ReadStream as returned by fs.createReadStream(path).

Mar 06 '15 13:03 LinusU

If you've not already done so, have a look at the Stream discussion in #30.

At the time, @jcupitt created a load-from-stream branch with some initial ideas, mainly around the creation of a VipsInputStream/VipsOutputStream.

On a related note, I spotted this recent change in io.js that looks like it could make handing JavaScript Stream objects in C++ easier.

Mar 06 '15 14:03 lovell

My final thought back then was in this comment: https://github.com/lovell/sharp/issues/30#issuecomment-46960443

Thinking about it again, it might be possible to do a hybrid of buffering and streaming. You could read and store data off the input until there was enough there that you could get all the metadata you need to be able to decide how to handle the file.

Once you've picked an approach, close the image and start again. This time you feed the reader the stored data from the first pass and only start reading off the input again when that runs out. At that point (presumably) you are decoding actual pixels.

There are still some awful problems. What if you send 10 jpg images down a pipe to a resizer, but image 4 is corrupt. How will the reader recover from the error?

Mar 06 '15 15:03 jcupitt

I'm sorry, I missed out the very vital part that is that I'm talking about output, not input.

I'm not that interested in being able to pipe more than one image and for that I think it would be more reasonable to use object-streams (objectMode = true) for. Anyhow, that's another discussion.

My use case is to stream the output image data to S3 (or to any other service for that matter...).

My ideal usage would be like this:

var out = sharp(bufferObject).resize(1080, 1080).min().toStream();
S3.upload({ Body: out, Key: 'my-image.jpg' }, cb);

This could be done as a said in number 2 by letting libvips write it to a named pipe and then createReadStream() on that one. That should give me data events as the data is normally written to disk, e.g. when fwrite(...) is called in C.

Actually, even more usual is probably piping directly back to the browser on http request.

server.on('request', function (req, res) {
  Sharp(myBuffer).resize(1080, 1080).min().toStream().pipe(res);
});

Why I like to have .toStream is because I think it makes it more explicit on what I want to do. In my humble opinion that is better than letting the sharp object always be a stream, which sometimes emits data.

Mar 06 '15 15:03 LinusU

Ah OK, that makes sense. I agree, this would be an nice cleanup to libvips. The load-from-stream branch does this already, I'll fix it up a bit and add it.

Mar 09 '15 12:03 jcupitt

@jcupitt So what's the resolution on this issue? It looks like the load-from-stream branch was closed.

Jul 14 '15 18:07 joe1chen

I spent a bit of time looking at the branch, but master has moved and merging it would have taken quite a lot of work.

Another alternative would be GIO. It has a thing called GFile which does more or less the same thing as VipsInputStream / VipsOutputStream, but better: a POSIX-y IO API which can have memory, files, sockets, URIs, or whatever you like as the backend:

https://developer.gnome.org/gio/stable/GFile.html

We could rework the png/jpg/webp read and write operations to use GFile. We'd then be able to open images directly over http or ftp, for example:

vips copy ftp://a.b.c/d.jpg x.jpg

No idea if that would be useful.

On the other hand ... It would mean adding yet another required dependency. And VIpsInputStream / VipsOutputStream were small and easy to understand and extend.

Jul 15 '15 10:07 jcupitt

@jcupitt great work on the streams branch, a shame it could never be merged. Just adding my support for any work on true streams, which would allow significantly less memory usage since right now sharp needs to load the source image into memory, and then hold the whole output buffer in memory, before writing out.

Feb 26 '16 03:02 gmaclennan

I'm trying to find a conclusion to this, did anyone succeed in getting sharp to write to a readableStream that could as suggested above be piped to a request from a browser.

Dec 17 '19 10:12 mitra42

libvips 8.9 (due rsn) has true streaming:

https://libvips.github.io/libvips/2019/11/29/True-streaming-for-libvips.html

eg.

aws s3 cp s3://mybucket/input.jpg - | \
    vips thumbnail_stream [descriptor=0] .jpg 128 | \
    aws s3 cp - s3://mybucket/output.jpg

Obviously you could send to a browser instead.

It would need some (unknown?) amount of work to integrate this in sharp. I think Lovell is planning to look at it when 8.9 finally ships.

Dec 17 '19 10:12 jcupitt

Yes, after there's a version of sharp that depends on the future libvips v8.9.0 we can look at the best way of handling its new Stream API.

A possible first implementation could involve accepting file descriptors as input/output, although that is likely to have to be via new API and not the standard, idiomatic Node.js Stream API (which will still be supported using the existing implementation).

A more complete solution might be to create custom libvips Stream objects and use these for passing data chunks to/from Node.js, but there are multi-threading considerations e.g. libvips will be running on a different thread to V8.

I'm planning to migrate the internals of sharp to use Node.js's newish N-API first - see #1282 - sadly I don't see anything in N-API for Node.js Stream objects, plus this migration will remove access to things like Node.js' StreamBase class.

Dec 17 '19 18:12 lovell

Huh that sounds a little frustrating. Do let me know if there's anything I can do to help.

Dec 17 '19 20:12 jcupitt

Thanks for the explanation - I'm not up enough on the details of the APIs to make intelligent comments on the choices made, but I think keeping a mind on the goal is important - i.e. to be able to interface with external (non-sharp) code.

In my case .... I had a Node stream with the image rather than a file, (being pulled from the net or a cache, or a P2P process) and needed to pass it back to the browser request. There are probably other requirements for other people but I'm guessing these node streams are the common theme to most of them rather than file descriptors.

Also ... can I suggest a minor change to the documentation to explain how to work with streams, it only took me a few minutes to find that sharp claimed to support streams, it took me a couple of hours - searching through git issues etc - to figure out how to plug it into streams, and I almost gave up when this thread implied (to me) that it wasn't supported (rather than what it actually says, which is that its supported but with bad memory handling).

I'd suggest an edit to https://sharp.pixelplumbing.com/en/stable/api-constructor/ maybe one of the examples, best I could figure out was the following code.

# Where s is the stream; and res is a browser Response. 
cost sharpie = sharp(); 
s.pipe(sharpie);
sharpie.extract({left,top,width,height})
    .pipe(res);

This works (subject to the memory issues) but wasn't intuitive to figure this out from the docs and I'm not at all sure its the right way to do it. The instinct was to look at "toFile", it took a while to realize that wasnt the right way.

Oh ... and thanks, once I figured this out, the library's saved me a bunch of work !

Dec 17 '19 20:12 mitra42

@mitra42 Glad you got it working; is this for use with the Internet Archive?

The second example in https://sharp.pixelplumbing.com/en/stable/api-constructor/#examples which is generated from https://github.com/lovell/sharp/blob/master/lib/constructor.js#L53 demonstrates Stream-based I/O and I'm always happy to accept PRs with more examples and improvements to the JSDocs if you're able.

Dec 18 '19 17:12 lovell

@lovell Oops - sorry I missed that example - not sure how.

Yes -this is for the Internet Archive - for the Offline Archive, a version of the archive running in devices as small as Raspberry Pi's in particular to support interaction between that box and a local copy of mediawiki running an extension built by David Kamholz for transcribing palm leaf books, it was using IIIF on the archive to zoom in on the full scan, and I needed to port that support to Javascript. I tried using IIIF via @jonallo 's iiif-image-server-node but the installer for that is broken (it fails to install sharp) and once I'd got sharp installed separately I realized sharp could probably be used better on its own than going through iiif-image-server-node > iiif-image

It took all of 25 lines of code ! So pretty happy with that. https://github.com/internetarchive/dweb-mirror/blob/master/mirrorHttp.js#L636

Dec 18 '19 21:12 mitra42

Yes, after there's a version of sharp that depends on the future libvips v8.9.0 we can look at the best way of handling its new Stream API.

A possible first implementation could involve accepting file descriptors as input/output, although that is likely to have to be via new API and not the standard, idiomatic Node.js Stream API (which will still be supported using the existing implementation).

A more complete solution might be to create custom libvips Stream objects and use these for passing data chunks to/from Node.js, but there are multi-threading considerations e.g. libvips will be running on a different thread to V8.

I'm planning to migrate the internals of sharp to use Node.js's newish N-API first - see #1282 - sadly I don't see anything in N-API for Node.js Stream objects, plus this migration will remove access to things like Node.js' StreamBase class.

@lovell is this still on the roadmap given sharp is now using libvips 8.9?

Apr 01 '20 15:04 summera

Do I understand correctly? The underlying cli command vips does supports streams and won't load entire file to start processing. But the Sharp library can't do it? So in Node.js I have to do all in Linux Shell land to achieve it? Maybe I could run vips shell command as child_process.spawn and connect the child process to Node streams via its stdin/stdout?

Jun 27 '22 22:06 germansokolov13

As https://github.com/libvips/libvips/pull/1443 has been merged is there any progress to make streams fully work in sharp as well? 😇

I'm currently already using sharp for profile image upload though they have a much lower max size than the images we want to allow in content so real streaming straight from an xhr upload straight from the browser and then piping to R2 would best.

Sep 19 '23 03:09 CanRau

This improvement will greatly enhance the end user experience.

In our current process, we download the source image and pipe it to Sharp for resizing. Then, we pipe the resized image to an S3 bucket and simultaneously as a chunked http response. However, there is currently a delay because Sharp only emits one large chunk. By implementing proper stream emission, the final image will be delivered to end users more quickly.

Feb 16 '24 10:02 exentrich

sharp sharp copied to clipboard

Emit multiple data chunks from output Streams (was: Proper streaming)

sharp
sharp copied to clipboard