mpifileutils icon indicating copy to clipboard operation
mpifileutils copied to clipboard

Problems with dcp/dsync with ACLs

Open akesandgren opened this issue 4 years ago • 9 comments

Using version 0.10.1

Simple test case:

b-cn0123 [ake]$ ls -lR dcp-test
dcp-test:
total 4
dr-xr-x---+ 3 ake folk 4096 Oct  2 21:02 q/

dcp-test/q:
total 4
dr-xr-x---+ 2 ake folk 4096 Oct  2 20:30 r/

dcp-test/q/r:
total 0
-r--r-x---+ 1 ake folk 0 Oct  2 20:30 file*

b-cn0123 [ake]$ getfacl -R dcp-test/
# file: dcp-test/
# owner: ake
# group: folk
user::r-x
user:yyy:r-x
group::r-x
mask::r-x
other::---

# file: dcp-test//q
# owner: ake
# group: folk
user::r-x
user:yyy:r-x
group::---
mask::r-x
other::---

# file: dcp-test//q/r
# owner: ake
# group: folk
user::r-x
user:yyy:r-x
group::---
mask::r-x
other::---

# file: dcp-test//q/r/file
# owner: ake
# group: folk
user::r--
user:yyy:r-x
group::---
mask::r-x
other::---

And this is the result:

b-cn0123 [stor10]$ mpirun -n 1 dcp --preserve /pfs/nobackup/home/a/ake/dcp-test /pfs/stor10/proj-test
[2020-10-02T21:13:03] Preserving file attributes.
[2020-10-02T21:13:03] Walking /pfs/nobackup/home/a/ake/dcp-test
[2020-10-02T21:13:03] Walked 4 items in 0.008278 secs (483.180546 items/sec) ...
[2020-10-02T21:13:03] Walked 4 items in 0.008357 seconds (478.636938 items/sec)
[2020-10-02T21:13:03] Copying to /pfs/stor10/proj-test
[2020-10-02T21:13:03] Items: 4
[2020-10-02T21:13:03]   Directories: 3
[2020-10-02T21:13:03]   Files: 1
[2020-10-02T21:13:03]   Links: 0
[2020-10-02T21:13:03] Data: 0.000 B (0.000 B per file)
[2020-10-02T21:13:03] Creating directories.
[2020-10-02T21:13:03]   level=6 min=1 max=1 sum=1 rate=118.951962/sec secs=0.008407
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:866] ERROR: Create `/pfs/stor10/proj-test/dcp-test/q' mkdir() failed (errno=13 Permission denied)
[2020-10-02T21:13:03]   level=7 min=1 max=1 sum=1 rate=1242.445929/sec secs=0.000805
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:866] ERROR: Create `/pfs/stor10/proj-test/dcp-test/q/r' mkdir() failed (errno=2 No such file or directory)
[2020-10-02T21:13:03]   level=8 min=1 max=1 sum=1 rate=26178.695777/sec secs=0.000038
[2020-10-02T21:13:03]   level=9 min=0 max=0 sum=0 rate=0.000000/sec secs=0.000000
[2020-10-02T21:13:03] Created 3 directories in 0.009315 seconds (322.064200 items/sec)
[2020-10-02T21:13:03] Creating files.
[2020-10-02T21:13:03]   level=6 min=0 max=0 sum=0 rate=0.000000 secs=0.000002
[2020-10-02T21:13:03]   level=7 min=0 max=0 sum=0 rate=0.000000 secs=0.000000
[2020-10-02T21:13:03]   level=8 min=0 max=0 sum=0 rate=0.000000 secs=0.000000
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:1113] ERROR: File `/pfs/stor10/proj-test/dcp-test/q/r/file' mknod() failed (errno=2 No such file or directory)
[2020-10-02T21:13:03]   level=9 min=1 max=1 sum=1 rate=21728.266302 secs=0.000046
[2020-10-02T21:13:03] Created 1 items in 0.000194 seconds (5160.331500 items/sec)
[2020-10-02T21:13:03] Copying data.
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:1926] ERROR: Failed to open output file `/pfs/stor10/proj-test/dcp-test/q/r/file' (errno=2 No such file or directory)
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:2059] ERROR: Failed to copy `/pfs/nobackup/home/a/ake/dcp-test/q/r/file' to `/pfs/stor10/proj-test/dcp-test/q/r/file'
[2020-10-02T21:13:03] Copy data: 0.000 B (0 bytes)
[2020-10-02T21:13:03] Copy rate: 0.000 B/s (0 bytes in 0.002185 seconds)
[2020-10-02T21:13:03] Syncing data to disk.
[2020-10-02T21:13:03] Sync completed in 0.000652 seconds.
[2020-10-02T21:13:03] Setting ownership, permissions, and timestamps.
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:328] ERROR: Failed to change ownership on `/pfs/stor10/proj-test/dcp-test/q/r/file' lchown() (errno=2 No such file or directory)
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:354] ERROR: Failed to change permissions on `/pfs/stor10/proj-test/dcp-test/q/r/file' chmod() (errno=2 No such file or directory)
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:491] ERROR: Failed to change timestamps on `/pfs/stor10/proj-test/dcp-test/q/r/file' utime() (errno=2 No such file or directory)
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:328] ERROR: Failed to change ownership on `/pfs/stor10/proj-test/dcp-test/q/r' lchown() (errno=2 No such file or directory)
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:354] ERROR: Failed to change permissions on `/pfs/stor10/proj-test/dcp-test/q/r' chmod() (errno=2 No such file or directory)
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:491] ERROR: Failed to change timestamps on `/pfs/stor10/proj-test/dcp-test/q/r' utime() (errno=2 No such file or directory)
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:328] ERROR: Failed to change ownership on `/pfs/stor10/proj-test/dcp-test/q' lchown() (errno=2 No such file or directory)
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:354] ERROR: Failed to change permissions on `/pfs/stor10/proj-test/dcp-test/q' chmod() (errno=2 No such file or directory)
[2020-10-02T21:13:03] [0] [/scratch/eb-buildpath/mpifileutils/0.10.1/gompi-2020a/mpifileutils-0.10.1/src/common/mfu_flist_copy.c:491] ERROR: Failed to change timestamps on `/pfs/stor10/proj-test/dcp-test/q' utime() (errno=2 No such file or directory)
[2020-10-02T21:13:03] Updated 4 items in 0.008529 seconds (468.992447 items/sec)
[2020-10-02T21:13:03] Syncing directory updates to disk.
[2020-10-02T21:13:03] Sync completed in 0.000095 seconds.
[2020-10-02T21:13:03] Started: Oct-02-2020,21:13:03
[2020-10-02T21:13:03] Completed: Oct-02-2020,21:13:03
[2020-10-02T21:13:03] Seconds: 0.021
[2020-10-02T21:13:03] Items: 1
[2020-10-02T21:13:03]   Directories: 1
[2020-10-02T21:13:03]   Files: 0
[2020-10-02T21:13:03]   Links: 0
[2020-10-02T21:13:03] Data: 0.000 B (0 bytes)
[2020-10-02T21:13:03] Rate: 0.000 B/s (000 bytes in 0.021 seconds)

The same test with a directory without ACLs works perfectly.

akesandgren avatar Oct 02 '20 18:10 akesandgren

Problem seem to be in mfu_copy_xattrs when called from mfu_create_directory. The xattrs contain system.posix_acl_default and system.posix_acl_access. When chmod -w has been done on the src dir, the system.posix_acl_access xattr contains user::r-x resulting in write-protected target.

I suggest filtering out any system.posix_acl xattrs in mfu_copy_xattrs, like in #401

akesandgren avatar Oct 03 '20 07:10 akesandgren

I think part of the issue here is that mfu_copy_xattrs is called immediately after creating each directory/file, to accommodate Lustre striping params: https://github.com/daltonbohning/mpifileutils/blob/4ec784108d066abe5060ebb197c1dba83e88bd7d/src/common/mfu_flist_copy.c#L933-L942

In contrast, ownership, permissions, and timestamps are copied after copying the files, and starting from the deepest level, to handle similar issues with standard permissions.

I wonder if there is a solution that will allow the system.posix_acl* xattrs to still be copied, while also circumventing this permission issue during directory creation.

daltonbohning avatar Oct 16 '20 00:10 daltonbohning

Well if they can be edited so that you have u+w then it would be ok to copy them at that point in time.

akesandgren avatar Oct 16 '20 05:10 akesandgren

Ideally, a full solution might look something like:

// copy lustre xattrs before copying files
setxattr(dst_dir) for each "lustre.*" xattr

// copy the files

// copy all other xattrs
setxattr(dst_dir) for all other non-lustre xattr

Or, we could filter out system.* attrs in the first pass, and then the second pass would add those in.

But this would require two passes over the xattrs and some filtering, which would affect performance. But perhaps this performance hit could be mitigated somewhat by using flags in some way to determine if one or both passes are necessary. @adammoody Do you have any thoughts on this? DAOS also utilizes the system.posix_acl* attributes, so just not copying those at all would not be desirable.

daltonbohning avatar Oct 16 '20 17:10 daltonbohning

@daltonbohning typically, system xattrs should not be copied directly, since they represent internal filesystem state. ACLs and such should be copied via the appropriate APIs, see GNU tar, for example.

adilger avatar Oct 17 '20 17:10 adilger

@adilger On my system at least, cp --preserve=xattr src dst does seem to copy xattrs set with setfacl (CentOS Linux release 7.8.2003)

Running strace cp --preserve src dst:

...
fgetxattr(3, "system.posix_acl_access", ...)
fsetxattr(4, "system.posix_acl_access", ...)
...

So cp does explicitly copy over the ACLs through xattr access.

I think the underlying issue with dcp isn't that the system.* xattrs are being copied incorrectly (I.e. with incorrect API), but that they are just being copied at the wrong time. I think it could be possible to implement a sort of "two-pass" xattr copy, which would be compatible with both Lustre (needs lustre.* xattrs before copying files) and other setups such as yours (need system.* xattrs after copying files)

Also, something that isn't yet supported is --preserve=[ATTR_LIST], which would allow specifically copying only some things (mode, ownership, timestamps, xattrs)

daltonbohning avatar Oct 17 '20 18:10 daltonbohning

Yes, I get the impression that full xattr support needs to be generalized in a couple ways. It seems like some xattrs must be copied before and some settings should be copied after. We've also talked about ways to allow a user to filter out some xattr settings completely.

Related issues: https://github.com/hpc/mpifileutils/issues/324 https://github.com/hpc/mpifileutils/issues/49

adammoody avatar Oct 18 '20 21:10 adammoody

Ah, yes! That would be a more complete solution.

daltonbohning avatar Oct 18 '20 22:10 daltonbohning

I think there are a couple of options in this case:

  • add --xattr-include=<regexp> and --xattr-exclude=<regexp> options (allow multiple) to filter xattrs
  • use /etc/xattr.conf to filter xattrs, which is what libattr.so checks

I can't find any documentation for /etc/xattr.conf in a man page, but the Git repo has an example file like:

# /etc/xattr.conf
#
# Format:
# <pattern> <action>
#
# Actions:
#   permissions - copy when trying to preserve permissions.
#   skip - do not copy.
system.nfs4_acl                 permissions
system.nfs4acl                  permissions
system.posix_acl_access         permissions
system.posix_acl_default        permissions
trusted.SGI_ACL_DEFAULT         skip            # xfs specific
trusted.SGI_ACL_FILE            skip            # xfs specific
trusted.SGI_CAP_FILE            skip            # xfs specific
trusted.SGI_DMI_*               skip            # xfs specific
trusted.SGI_MAC_FILE            skip            # xfs specific
xfsroot.*                       skip            # xfs specific; obsolete
user.Beagle.*                   skip            # ignore Beagle index data
security.evm                    skip            # may only be written by kernel

adilger avatar Oct 20 '20 03:10 adilger