pyvips icon indicating copy to clipboard operation
pyvips copied to clipboard

Read from s3?

Open misteliy opened this issue 6 years ago • 7 comments

can we read directly from s3 without downloading the file locally? In my case its a svs file of roughly 1 gb size. Cheers

misteliy avatar Dec 05 '19 17:12 misteliy

libvips 8.9 has a feature which might allow this:

https://libvips.github.io/libvips/2019/11/29/True-streaming-for-libvips.html

You'd need to write a small adapter class to do read and seek on large S3 buckets.

jcupitt avatar Dec 05 '19 17:12 jcupitt

Actually, having said that, openslide will not work with that new stream API, unfortunately.

You'll need to download the whole SVS until the openslide library allows remote read.

jcupitt avatar Dec 05 '19 17:12 jcupitt

Sorry, I should reply once and think a little longer.

SVS is a TIFF file, so all you'd need to do is swap TIFFOpen for TIFFClientOpen add implement callbacks for read-and-seek-from-URI.

jcupitt avatar Dec 05 '19 17:12 jcupitt

okay great. To give a bit more context, the ultimate goal would be to do some pre-processing in AWS lambda (s3 triggered)...hence it's crucial to only load a certain level of the svs file...similar to:

level = pyvips.Image.new_from_file(filename, level=0)

The problem is that we can't download the image to lambda since it only offers 512 MB of /tmp. Hence streaming only the relevant layer would be ideal. I hopefully find some time the upcoming weekend to look into this. Thanks a lot for the hints.

misteliy avatar Dec 05 '19 17:12 misteliy

How well does S3 handle random seek and read? Does it used http range requests?

jcupitt avatar Dec 05 '19 17:12 jcupitt

S3 APIs support the HTTP Range: header (see RFC 2616) which take a byte range argument.

Sample S3 call: aws s3api get-object --bucket my_bucket --key path/to/my/file/file1.gz file1.gz --range bytes=1000-2000

misteliy avatar Dec 05 '19 18:12 misteliy

That's good. You'll probably find you need a caching layer. TIFF makes a lot of random reads quite close to each other and it'll be horribly slow if you make a round trip for each one.

jcupitt avatar Dec 05 '19 21:12 jcupitt