zfs icon indicating copy to clipboard operation
zfs copied to clipboard

`copy_file_range` / `cp --reflink auto` broken on glibc 2.41/2.42 when used with ZFS

Open Fabian-Gruenbichler opened this issue 4 months ago • 6 comments

System information

Type Version/Name
Distribution Name Proxmox VE
Distribution Version 9.0
Kernel Version 6.14.8-2-pv
Architecture x86_64
OpenZFS Version 2.3.3

Describe the problem you're observing

this is mostly a heads-up, and not an actionable bug report for you folks - I hope it is still appreciated since it might help triaging incoming reports quicker.

copying a large file using cp from ZFS can cause corruption if the file's data/hole layout is of a specific kind. after some analysis by @Blub , it turns out this is not a ZFS bug, but a glibc one affecting glibc 2.41 and 2.42.

it seems likely that ZFS is the only file system affected, since most (all?) other file systems will not return such large offsets, processing at most 32-bit wide chunks in one call and thus avoiding the truncation. it also seems likely that cp (with --reflink auto, which is the default in modern coreutils) is not the only caller of copy_file_range affected by this glibc bug.

see https://debbugs.gnu.org/cgi/bugreport.cgi?bug=79139 and https://forum.proxmox.com/threads/proxmox-ve-9-0-iso-upload-corrupts-some-windows-server-2025-isos.169531/ for details

Describe how to reproduce the problem

see links above

Include any warning/errors/backtraces from the system logs

Fabian-Gruenbichler avatar Aug 13 '25 14:08 Fabian-Gruenbichler

Ouch, that's a nasty bug. Thanks for the heads up, I'm glad to see there's already a glibc fix for this.

behlendorf avatar Aug 13 '25 17:08 behlendorf

Dang, what a fascinating bug! Thanks for the report!

If it ever becomes an issue (like, some maintstream distro doesn't end up taking the patch) we can work around it pretty easily by limiting advance to <2G.

robn avatar Aug 13 '25 23:08 robn

the fix is already included in the stable branches for 2.41 and 2.42 which hopefully most distros will pick up reasonably soon. for Debian Trixie, it should be fixed in either the next point release (September) or the one after (November)

Fabian-Gruenbichler avatar Aug 14 '25 07:08 Fabian-Gruenbichler

Run a distribution which has more regular updates like Fedora. In Fedora 42 it has already been fixed since 2/2025. See here: https://koji.fedoraproject.org/koji/buildinfo?buildID=2663025

wiesl avatar Sep 03 '25 04:09 wiesl

Run a distribution which has more regular updates like Fedora. In Fedora 42 it has already been fixed since 2/2025

You're mistaken there, the fix was committed to upstream master on August 1st and we backported it to the stable branches immediately after. Distributions (good ones at least!) track the stable branches and pull from them regularly. But certainly it couldn't have been fixed several months before it was first reported.

thesamesam avatar Sep 14 '25 17:09 thesamesam

FWIW, we found another instance of this in the wild where copying fails after a while with EXDEV, instead of corrupting the target file.

Fabian-Gruenbichler avatar Nov 13 '25 07:11 Fabian-Gruenbichler