google-cloud-rs icon indicating copy to clipboard operation
google-cloud-rs copied to clipboard

Streaming object insert and get, please

Open nlfiedler opened this issue 3 years ago • 3 comments

I can use memmap to effectively stream a large file when calling create_object(), but when calling get() it will always return a Vec<u8> result. I see there are commented out "writer" and "reader" functions, so I'm filing this request just to track the need for this feature. For my use case, I'm always going to be dealing with files that are 64mb or larger, so streaming would be good.

P.S. The google_storage1 crate defines a ReadSeek trait that is used for uploading files. For download, I think they rely on hyper, enabling std::io::copy() directly to a file.

nlfiedler avatar Mar 25 '21 04:03 nlfiedler

Hello !

I agree that the ability to read and write GCS objects in a streaming fashion is definitely valuable.
The one thing that kind of blocked the implementation was that it was unclear what the API for it should be.

I am currently considering the following API:

impl Object {
    // `ObjectReader` would implement `futures_io::AsyncRead`
    pub async fn reader(&mut self) -> Result<ObjectReader, Error> {
        // ...
    }

    // `ObjectWriter` would implement `futures_io::AsyncWrite`
    pub async fn writer(&mut self, mime_type: impl AsRef<str>) -> Result<ObjectWriter, Error> {
        // ...
    }
}

But other crates sometime go for an API that resembles the following:

impl Object {
    // Asynchronously streams the bytes from the GCS object into the provided writer.
    pub async fn streaming_get<W: AsyncWrite>(&mut self, writer: W) -> Result<(), Error> {
        // ...
    }

    // Asynchronously streams the bytes from the provided reader into the GCS object.
    pub async fn streaming_put<R: AsyncRead>(&mut self, mime_type: impl AsRef<str>, reader: R) -> Result<(), Error> {
        // ...
    }
}

I was more inclined to implement the first design rather than the second one because the second one moves the iteration process away from your control and therefore makes it harder to just iterate over the bytes manually, without needing some kind of IO AsyncRead/AsyncWrite in-memory pipe, like the one from sluice.

But I suspect that even the first design might require this kind of in-memory IO pipe to implement the writer method.

Hirevo avatar Mar 25 '21 15:03 Hirevo

Hi !

I try to implement the storage API right now into the following project of mine https://github.com/Roba1993/stow Maybe you can get some idea on how to solve it there. I went with a AsyncRead for both file get and put, which works quiete nicely.

Roba1993 avatar Apr 10 '21 10:04 Roba1993

We use rusoto_s3 for pushing to Cloud Storage using the S3-compatible API. It works pretty well and supports streaming bodies. Here's a decent example from their integration tests: https://github.com/rusoto/rusoto/blob/master/integration_tests/tests/s3.rs#L865

abonander avatar Jan 27 '22 23:01 abonander