s3proxy icon indicating copy to clipboard operation
s3proxy copied to clipboard

Large files / many files issues when using filesystem

Open Jayd603 opened this issue 2 years ago • 8 comments

It appears like directory listings are recursive (that is, s3proxy is reading in all files before presenting the client with anything) and even listing the root directory leads to a traversal before the list is shown, even if only one directory needs to be shown it can take awhile to list directories if there are a lot of files within sub-directories, especially if you are pointing the filesystem base dir to a network share mount.

It also seems like s3proxy limits request size - so if xbcloud wants to send 100MB chunks (using --read-buffer-size) the requests fail.

blocks.ibd.00000000000000000000, size: 52428858 221117 20:05:05 xbcloud: S3 error message: <Error><Code>MaxMessageLengthExceeded</Code><Message>Your request was too big.</Message><RequestId>4442587FB7D0A2F9</RequestId></Error> 221117 20:05:05 xbcloud: error: failed to upload chunk

Edit: these should probably be separate issues, my bad. for the directory listings, If i list a directory through s3proxy that has no sub-directories it's fast. This is why I have the theory about gathering all file info recursively before display as being the issue. The slowness is amplified when using a network share on the file system. I think that is where it really becomes noticeable, the filesystem backend code was probably not written with network shares in mind. It still might be doing things in a not very efficient way regardless though.

Jayd603 avatar Nov 18 '22 01:11 Jayd603

These are separate issues.

S3Proxy limits the size of non-chunked requests to 32 MB by default but you can override this via the s3proxy.v4-max-non-chunked-request-size property. Please open a new issue if AWS has a different limit because I set this several years ago. Another workaround is to use multi-part uploads.

S3Proxy does list all the blobs on the filesystem by default. This is an unfortunate limitation within Apache jclouds. You can see how this work by looking at LocalBlobStore.list and FilesystemStorageStrategyImpl.getBlobKeysInsideContainer. You can optimize this by pushing down the delimiter from the former into the latter which will vastly improve performance. The fix is not particularly difficult but requires fighting between a few levels of abstraction in Java.

gaul avatar Dec 23 '22 07:12 gaul

Upstream issue: JCLOUDS-1371

gaul avatar Dec 26 '22 06:12 gaul

Hey, i am having a simalar issue. I am trying to figuer out what values i can supply to this - name: S3PROXY_V4_MAX_NON_CHUNKED_REQ_SIZE value: ???

The error i am geeting is

An error occurred (MaxMessageLengthExceeded) when calling the PutObject operation: Your request was too big.

Can i edit this env somehow in Kustomize? Or do i have to build my own image by changing some java files?

Hackmeat avatar Jan 19 '23 13:01 Hackmeat

@Jayd603 Please test the latest master (git clone followed by mvn package) and report back your experiences with listing directories. I'm not sure if anything can be done to improve large multipart uploads so lets leave this open to see if others have clever ideas. I will run another S3Proxy release after jclouds does a release, probably in 1-3 months.

gaul avatar Jan 28 '23 13:01 gaul

#471 tracks the poor performance with large multipart uploads.

gaul avatar Jan 30 '23 05:01 gaul

@Hackmeat This is a different symptom. Please open another issue and share the different behavior from AWS.

gaul avatar Jan 30 '23 05:01 gaul

@gaul I see the jcloud update was reverted. Is the issue fixed or not?

Upanshu11 avatar Oct 26 '23 06:10 Upanshu11

Reopen this until I can address the jclouds issue.

gaul avatar Nov 17 '23 04:11 gaul