rclone Increase copy speed by re-enabling buffering for multithreaded copies from local filesystems

What is the purpose of this change?

Make multithreaded copies from local to S3 fast again (by re-enabling buffering)!

From v1.64.2 to v1.65+, the performance of multithreaded copies from local to S3 decreased. After some investigation and debugging, it seems this is a side effect of this commit / issue https://github.com/rclone/rclone/issues/7350 - more specifically disabling buffering when copying from local filesystems.

I executed some tests in a m6a.2xlarge AWS EC2 instance (Network up to 12.5 Gbps and EBS up to 10 Gbps) and here are the results:

Copy of a 64 GB file from the local filesystem to S3 time ./rclone copyto --s3-no-check-bucket --ignore-checksum --s3-disable-checksum --progress --s3-upload-cutoff=0 --multi-thread-cutoff=256M --multi-thread-streams 20 --disable=copy --no-check-dest <Local File> <S3 Bucket>

Buffering	Time	Avg Speed
`disabled`	18 min 46s	~58 MB/s
`enabled`	9m 10s	~121 MB/s

I'm not completely sure about the memory consumption implications - but alternatively if it cannot be enabled by default - could we consider making it configurable?

What do you think? Looking forward to some input and feedback!

Was the change discussed in an issue or in the forum before?

No

Checklist

[x] I have read the contribution guidelines.
[x] I have added tests for all changes in this PR if appropriate.
[x] I have added documentation for the changes if appropriate.
[x] All commit messages are in house style.
[x] I'm done, this Pull Request is ready for review :-)

Jul 31 '25 16:07 vitorog

Hmm, interesting. Fundamentally the disk should read at the same speed into the s3 multipart buffer (how it is at the moment) or into a memory buffer (like it used to be). Given that disk read speeds > network speeds why is this making a difference?

My guess is that it is because the s3 backend reads each block 3 times (once to MD5 it, once to sign it and once to send it). Before it read once off disk and twice out of RAM. Now it is reading 3 times off disk.

The OS should have cached the 2nd and 3rd reads though but...we may well have disabled that with fadvise.

Try this patch and see if it make a difference.

diff --git a/backend/local/local.go b/backend/local/local.go
index 14effd2a9..adc4568c3 100644
--- a/backend/local/local.go
+++ b/backend/local/local.go
@@ -1350,7 +1350,7 @@ func (o *Object) Open(ctx context.Context, options ...fs.OpenOption) (in io.Read
 	if err != nil {
 		return
 	}
-	wrappedFd := readers.NewLimitedReadCloser(newFadviseReadCloser(o, fd, offset, limit), limit)
+	wrappedFd := readers.NewLimitedReadCloser(fd, limit)
 	if offset != 0 {
 		// seek the object
 		_, err = fd.Seek(offset, io.SeekStart)

Disabling fadvise was discussed in https://github.com/rclone/rclone/issues/7886 - maybe we should.

Second guessing the OS is probably a bad idea since I'm sure the linux kernel developers are better at memory management than me :-)

Aug 01 '25 15:08 ncw

hi, @ncw , thanks for your reply. Your explanation actually makes a lot of sense (way better than the theory I came up with 😅). I tried your patch and got basically the same improvement (~2x speed increase) 🎉

Same test (v1.70.3 with fadvise disabled):

Transferred:       64.297 GiB / 64.297 GiB, 100%, 110.439 MiB/s, ETA 0s
Transferred:            1 / 1, 100%
Elapsed time:      8m50.0s

real    8m50.098s

So my impression based on the issue you linked is that the "proper" way forward here would be to make fadvise configurable - if that's the case I would close this PR.

It may be beyond my linux / golang skills, but I could give a try at https://github.com/rclone/rclone/issues/7886 and making it configurable.

Aug 01 '25 15:08 vitorog

Closing this PR as there is a bigger discussion about this topic here: https://github.com/rclone/rclone/pull/8723

Aug 04 '25 08:08 vitorog

hi, @ncw , I'm re-opening this issue because after running more tests in a different setup, it seems fadvise is not the only thing impacting speeds 😅

I'm running a setup in Kubernetes where some SMB volumes are mounted in the nodes. I'm using rclone to transfer files from this storage to S3. Since the volumes are mounted in the nodes, rclone uses the "local" backend (not the SMB one).

Running the same test (64 GB file upload to S3 - g4dn.4xlarge EC2 instances in AWS Outpost):

Setting	Avg Speed
`fadvise disabled`	~165 MB/s
`buffering re-enabled`	~310 MB/s
`buffering re-enabled + fadvise disabled`	~347 MB/s

I think since we are reading data from a network disk (SMB), disabling buffering has a huge impact when doing the multithreaded uploads to S3. What do you think?

Aug 08 '25 07:08 vitorog

This is the conclusion that the restic project came to - disable fadvise and have RAM buffing for max performance.

I don't want to balloon the memory usage of rclone though there was a reason we did this - issue #7350 - that issue has a lot of good stuff in - it is worth a read.

Adaping the table from #7350 this is what is actually implemented now

RAM: Buffer in RAM
None: Don't buffer, but re-read from the source if necessary.

The logic is RAM unless

the destination supports OpenWriterAt, eg local, azurefiles, smb, pcloud => None
the source is local => None
the destination supports OpenChunkWriter and promises not to seek its chunks except for retries, eg b2 => None

Source backend	Destination backend	Buffering
local	any	None¹
any	local/azurefiles/smb/pcloud	None²
any	b2	None
any	s3/azureblob/oos	RAM

And the notes from before

¹ Needs performance testing to see if it slows stuff down a lot! Might need to be RAM Yes it does slow stuff down

² It works like this at the moment as the local backend never needs retries. (OpenWriterAt doesn't read the data twice)

Does that mean we should make it configurable? I hesitate to add yet another configuration flag for the poor users though. #7350 suggests a --low-memory flag which would make sure we used disk buffering for local reads.

Perhaps a more targeted flag like --multi-thread-low-memory which if true uses None strategy, so make the rules

The logic is RAM unless

the destination supports OpenWriterAt, eg local, azurefiles, smb, pcloud => None
the source is local and --multi-thread-low-memory == true => None
the destination supports OpenChunkWriter and promises not to seek its chunks except for retries, eg b2 => None

We do now have --max-buffer-memory which can be used to control how much memory rclone is using and that does work pretty well which could be used instead of --multi-thread-low-memory.

What do you think @vitorog ?

Aug 16 '25 14:08 ncw

Strictly for SMB (https://linux.die.net/man/8/mount.cifs), there also seem to be 2 options available that the users can tune themselves:

The FS-Cache via the fsc option
Setting cache to loose rather than the default value of strict

I wonder if any of these allow you to get the same performance as that of RAM caching via rclone without needing any changes.

Aug 16 '25 15:08 darthShadow