easybuild-framework
easybuild-framework copied to clipboard
copying of files from EasyBuild installation in CernVM-FS to installation directory on other filesystem fails due to xattrs
When using EasyBuild, it seems that the build directory must reside on a filesystem with 512-byte inodes. When using 256-byte inodes, EasyBuild fails while copying recipes to the build directory with errors such as the following:
ERROR: Build of /home/user/easy_buid/zlib-1.2.11-GCCcore-9.3.0.eb failed (err: "build failed (first 300 chars): Failed to copy file /cvmfs/soft.computecanada.ca/easybuild/site-packages/easybuild-easyblocks/easybuild/easyblocks/generic/configuremake.py to /home/user/tmp/eb-611zkjwg/reprod_20211204102755_332067/easyblocks/configuremake.py: [Errno 28] No space left on device: '/home/user/tmp/eb-611zkjwg/repr")
The issue seems to be that the inodes are too small to hold extended attributes set by EasyBuild. (The /home
filesystem in this example has ample free space.) I tested creating two ext4 filesystems inside files, one with 256-byte inodes, the other with 512-byte ones, mounting them, and using them for build directories. Using the partition with 256-byte inodes, EasyBuild threw an error like the above. Using the partition with 512-byte inodes fixed the issue.
Since the default for many mkfs.*
programs is 256-byte inodes, perhaps this should be documented as an EasyBuild requirement.
Digging into this issue a bit more, it seems that the cause is that our EasyBuild files are accessed through CVMFS, which puts a large number of xattrs on files. EasyBuild then uses shutils.copy2
rather than shutils.copy
when copying recipes to the build directory, attempting to preserve these xattrs that do not fit except on large inodes. I do not know if preserving metadata with copy2
was intended, or if there is a workaround to avoid EasyBuild trying to copy these xattrs. Using recipes in a CVMFS mount is rather common in our setup so I would like to find a solution.
The use of shutil.copy2
is deliberate mostly to ensure metadata like datestamps, etc. is preserved.
I wasn't aware that this causes trouble when stuff is being copied from a CernVM-FS mount to another filesystem.
We could work around that by using shutils.copy
to copy stuff like used easyblock to the installation directory, but perhaps there's a better way...
@mboisson, @bartoldeman: any suggestions here?
I'm thinking perhaps it could catch the exception and retry with shutil.copyfile
.
It already checks the uid, in this code (see easybuild/tools/filetools.py
):
# if target file exists and is owned by someone else than the current user,
# try using shutil.copyfile to just copy the file contents
# since shutil.copy2 will fail when trying to copy over file metadata (since chown requires file ownership)
elif target_exists and os.stat(target_path).st_uid != os.getuid():
shutil.copyfile(path, target_path)
_log.info("Copied contents of file %s to %s", path, target_path)
else:
mkdir(os.path.dirname(target_path), parents=True)
if path_exists:
shutil.copy2(path, target_path)
_log.info("%s copied to %s", path, target_path)
Even simpler would be to use copyfile
always and attempt copystat
, ignoring any exceptions from it.
after all shutil.copy2
is coded as follows (Py 3.7; Py 2.7 is identical except for the return dst
)
def copy2(src, dst, *, follow_symlinks=True):
"""Copy data and metadata. Return the file's destination.
Metadata is copied with copystat(). Please see the copystat function
for more information.
The destination may be a directory.
If follow_symlinks is false, symlinks won't be followed. This
resembles GNU's "cp -P src dst".
"""
if os.path.isdir(dst):
dst = os.path.join(dst, os.path.basename(src))
copyfile(src, dst, follow_symlinks=follow_symlinks)
copystat(src, dst, follow_symlinks=follow_symlinks)
return dst
Just a note, this does not really have anything to do with CVMFS. It has to do with copying files from a source filesystem that has longer extra attributes (in this case 512 bytes) than the destination filesystem (in this case 256 bytes).
Even simpler would be to use
copyfile
always and attemptcopystat
, ignoring any exceptions from it.
That's definitely worth considering... I wonder if that could lead to unwanted results though.
Is there a scenario when copying the file worked, but copying the metadata fails for a "good" reason (i.e. one we'd like to know about)?
I guess an option would be to use Bart’s approach but convert exceptions raised by copystat
to warnings.
I've implemented the change that @bartoldeman proposed in #3912 . It works well in the sense that no tests were harmed, but I'm still wondering whether "silently" ignoring a failure to copy metadata is wise...
For stuff like timestamps that's probably not a huge problem, but for file permissions, it could be an issue: if we're copying a binary, but the execution permissions are not being set, and we simply ignore that, confusing problems may pop up elsewhere.
So maybe we should make this configurable, so people can opt-in to ignoring failing to copy file metadata?
Copying stuff from a filesystem with extended attributes to another filesystem that doesn't support this seems to be special enough that this problem hasn't popped up in ~10 years, so it's definitely not a common use case...
This seems to have been triggered by a change in the size of extra attributes in CVMFS with clients of version 2.9.0.
As discussed in Slack, the proper way out here is probably to configure CernVM-FS to hide the extended attributes via CVMFS_HIDE_MAGIC_XATTRS
(see also https://cvmfs.readthedocs.io/en/stable/apx-parameters.html)
Those extended attributes only makes sense for files parked in a CVMFS filesystem, they are essentially useless when files are being copied out to somewhere else...
Hiding the extra attribs in CVMFS only solves the problem for CVMFS, Lustre may also have larger attributes an d is also susceptible to the same problem. Also see #4234