mountpoint-s3 icon indicating copy to clipboard operation
mountpoint-s3 copied to clipboard

Dynamically update `part_size` for large uploads

Open jchorl opened this issue 2 years ago • 2 comments

Tell us more about this new feature.

Submitting a suggestion.

Currently, very large uploads fail with:

2023-08-12T15:25:30.052503Z  WARN write{req=1280011 ino=2 fh=1 offset=83886080000 length=65536}: mountpoint_s3::fuse: write failed: upload error: object exceeded maximum upload size of 83886080000 byte

And the cp return:

$ cp large.bin target/large1.bin                                                                                                                                  
cp: error writing ‘target/large1.bin’: File too large
cp: failed to extend ‘target/large1.bin’: File too large

This is rightfully documented in https://github.com/awslabs/mountpoint-s3/blob/b65eda8e26da85f90a5696f38715eeb67e64c409/doc/CONFIGURATION.md#maximum-object-size with a proposed configuration flag. Thanks!

When mountpoint receives a copy request for a large file (say, 500gb), we already know that 8mb part size will run up against the 10,000 part S3 limit.

Can mountpoint detect this case and just set the part-size high enough to support the transfer?

Is there a use-case where a user would want this to fail, opting for hard-failure of slightly reduced perf?

If we agree it's a good step from the usability standpoint, then how hard would it be to integrate technically into mountpoint?

jchorl avatar Aug 12 '23 15:08 jchorl

Thanks Josh, this is an interesting idea. I'll share with the team.

I'm not aware of a use case where a user would want this to fail at the moment. I think its a case of understanding how and if mountpoint-s3 can know when a write will exceed a certain size either at file creation time, or adapting its part size later as the file grows in size after subsequent writes.

dannycjones avatar Aug 14 '23 09:08 dannycjones

Unfortunately FUSE/POSIX don't give us a way to know the file size in advance (other than copy_file_range if you're copying from a file already in Mountpoint), so we don't have a ton of choices here. The right thing to do is probably to dynamically scale up the part size as the upload progresses.

Upload part size is controlled by the CRT: https://github.com/awslabs/aws-c-s3/blob/a691a2fcb49543c79cf9332c7c5dafab0dff6b97/source/s3_auto_ranged_put.c#L721-L737. has_content_length is always false for us, so in principle we'd just need a method that lets us change meta_request->part_size, which is otherwise private. So this change is not super complicated, but a bit involved since it requires changes on the CRT side. Actually, the CRT should really just handle this for us in the streaming upload case, but I don't know how complicated that would be.

jamesbornholt avatar Aug 14 '23 19:08 jamesbornholt