mongo-rust-driver icon indicating copy to clipboard operation
mongo-rust-driver copied to clipboard

RUST-1871 Convert gridfs to a fluent API

Open abr-egn opened this issue 1 year ago • 2 comments

RUST-1871

I opted to omit the versions of the upload/download methods that accept a stream - while the spec has both, IMO in the case of Rust it's both unidiomatic and unnecessary. For simple cases write_all / read_to_end does the same thing, and for the more general case it's covered by futures_util::io::copy.

Also, previously open_upload_stream was both non-async and infallible; I think that was an incident of implementation rather than an API guarantee we want to uphold, so I've made it consistent with open_download_stream.

abr-egn avatar Mar 20 '24 19:03 abr-egn

Thought about this more and I think we should keep both APIs. AsyncRead and AsyncWrite both only work with in-memory data, which means that if a user wants to download from or upload to a file they'd either need to load it into memory all at once or do their own chunking logic to avoid high memory usage. The download_to_writer and upload_from_reader methods only load one file chunk into memory at a time.

isabelatkinson avatar Mar 25 '24 15:03 isabelatkinson

AsyncRead and AsyncWrite both only work with in-memory data, which means that if a user wants to download from or upload to a file they'd either need to load it into memory all at once or do their own chunking logic to avoid high memory usage.

I don't think that's accurate 🙂 While the gridfs AsyncRead/AsyncWrite impls only deal with reading to / writing from an in-memory buffer, that's a buffer provided by the caller and doesn't need to be the size of the whole file.

For example, futures_util::io::copy internally uses a BufReader, which will read blocks of 8k bytes* at a time from the source and write those to the destination stream. This mean that running copy with a GridFsDownloadStream as the source and a tokio File as the destination will be processed as if it was (psuedocode):

const BUF_SIZE: usize = 8192;
while let Some(gridfs_chunk) = source.next_chunk() {
  for write_buf in gridfs_chunk.chunks(BUF_SIZE) {
    dest.write(write_buf);
  }
}

* This can be tweaked by the user if needed by creating their own BufReader and calling copy_buf.

abr-egn avatar Mar 25 '24 16:03 abr-egn

Ah I wasn't aware of copy -- I was thinking more about users working directly with the methods in AsyncRead(Ext) and AsyncWrite(Ext). Our documentation for working with the streams (here and here) currently points users towards those methods, so if we're no longer going to support the alternative API I suggest adding some instructions for working with large files to those docs -- linking to copy would probably be sufficient.

isabelatkinson avatar Mar 26 '24 16:03 isabelatkinson

Good thought, added notes to those docs about copy.

abr-egn avatar Mar 26 '24 19:03 abr-egn