`copy_file_range` / `cp --reflink auto` broken on glibc 2.41/2.42 when used with ZFS
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Proxmox VE |
| Distribution Version | 9.0 |
| Kernel Version | 6.14.8-2-pv |
| Architecture | x86_64 |
| OpenZFS Version | 2.3.3 |
Describe the problem you're observing
this is mostly a heads-up, and not an actionable bug report for you folks - I hope it is still appreciated since it might help triaging incoming reports quicker.
copying a large file using cp from ZFS can cause corruption if the file's data/hole layout is of a specific kind. after some analysis by @Blub , it turns out this is not a ZFS bug, but a glibc one affecting glibc 2.41 and 2.42.
it seems likely that ZFS is the only file system affected, since most (all?) other file systems will not return such large offsets, processing at most 32-bit wide chunks in one call and thus avoiding the truncation. it also seems likely that cp (with --reflink auto, which is the default in modern coreutils) is not the only caller of copy_file_range affected by this glibc bug.
see https://debbugs.gnu.org/cgi/bugreport.cgi?bug=79139 and https://forum.proxmox.com/threads/proxmox-ve-9-0-iso-upload-corrupts-some-windows-server-2025-isos.169531/ for details
Describe how to reproduce the problem
see links above
Include any warning/errors/backtraces from the system logs
Ouch, that's a nasty bug. Thanks for the heads up, I'm glad to see there's already a glibc fix for this.
Dang, what a fascinating bug! Thanks for the report!
If it ever becomes an issue (like, some maintstream distro doesn't end up taking the patch) we can work around it pretty easily by limiting advance to <2G.
the fix is already included in the stable branches for 2.41 and 2.42 which hopefully most distros will pick up reasonably soon. for Debian Trixie, it should be fixed in either the next point release (September) or the one after (November)
Run a distribution which has more regular updates like Fedora. In Fedora 42 it has already been fixed since 2/2025. See here: https://koji.fedoraproject.org/koji/buildinfo?buildID=2663025
Run a distribution which has more regular updates like Fedora. In Fedora 42 it has already been fixed since 2/2025
You're mistaken there, the fix was committed to upstream master on August 1st and we backported it to the stable branches immediately after. Distributions (good ones at least!) track the stable branches and pull from them regularly. But certainly it couldn't have been fixed several months before it was first reported.
FWIW, we found another instance of this in the wild where copying fails after a while with EXDEV, instead of corrupting the target file.