pyinfra icon indicating copy to clipboard operation
pyinfra copied to clipboard

files.directory doesn't ensure recursive ownership (add op arg)

Open missytake opened this issue 4 months ago • 2 comments

Describe the bug

In https://github.com/pyinfra-dev/pyinfra/blob/3.x/pyinfra/operations/files.py#L1609 only ownership of the top-level directory is checked, not of the subdirectories and files, also if you specify recursive=True.

To Reproduce

Steps to reproduce the behavior, please include where possible:

Create this directory structure:

$ ls -la test/
total 8
drwxrwxr-x  2 user     user 4096 Aug 16 09:38 .
drwxrwxr-x 15 user     user 4096 Aug 16 09:38 ..
-rw-rw-r--  1 www-data root    0 Aug 16 09:38 file

And apply the following operation to it:

files.directory(
  path="test",
  user="user",
  group="user",
  recursive=True,
)

Expected behavior

If you call files.directory with recursive=True and the user or group argument, you expect that it ensures that the directory and its subdirectories and all files in them are ensured to be owned by the specified user/group.

Recursive ownership could be checked with a simple find $path ! -user $user -group $group; for the largest directory we have (500k maildir directories), this check took ~13 minutes, compared to just running a chown, which took ~48 minutes. This can be a performance nightmare of course; maybe instead of fixing this, the current behavior should simply be documented. Maybe there are also faster options of checking recursive ownership I don't know about.

Thanks @cliffmccarthy for finding this: https://github.com/chatmail/relay/pull/609#discussion_r2263429306

missytake avatar Aug 16 '25 09:08 missytake

Given the performance overhead of merely checking, it seems to me the current behaviour should be documented, explaining that recursive only applies to the action taken, and that only the state of the base directory is checked when deciding whether to take action. I looked through the code history and it seems to have been this way for a long time, so changing existing functionality could be undesirable.

If we want to provide the ability to guarantee the final ownership of the entire tree without calling a potentially-slow chown every time, maybe another new argument could be added. A check_recursive argument could enable something like the find command above and use that to decide whether to call the recursive chown or not. This new argument would default to false, and could be documented as having potentially high overhead depending on the size of the tree. That way this would only affect callers that explicitly ask for this.

(What I would love to have is a "lazy chown" command, that only touches the inode if it isn't already in the desired state, but as far as I know, there isn't anything standard like that, so we're going to be bumping inode change times ("ctime") of the whole tree any time part of the tree doesn't have the desired ownership.) (And that operation would be even slower in some cases because it would have to stat() every inode before possibly calling chown() on it.)

cliffmccarthy avatar Aug 17 '25 15:08 cliffmccarthy

Completely agree this should be documented and the option added to do the checks recursively. Flagging as bug until the documentation bit is done as this will cause confusion.

Fizzadar avatar Sep 12 '25 12:09 Fizzadar