[Feature Request] Allow user to specify number of processes?
First off: thank you for writing and publishing fixuid — it solves a problem I've been having with development Docker containers for a while now!
I've been trying to use it for a development image I maintain, but I'm struggling with the startup time. The user I create in the Dockerfile has (unfortunately) many files in their home directory. This is mostly due to installing package managers (in this case, miniconda) with a few default environments pre-packaged. As such, fixuid takes ~9.5 minutes to do it's thing on the user's home directory (i.e., if the user's name is wally, under /home/wally).
I know from #31 that we can set paths in the configuration to the specific paths that we'd like fixuid to scan. That might work, but I think ultimately I would want to run fixuid over the wally user's working directory. In the vein of that issue's request for a progress bar, I was also wondering what considerations the library has about manually setting the number of max processes via https://github.com/boxboat/fixuid/blob/master/fixuid.go#L28. I can imagine that part of it is a security mitigation, since the script requires elevated permissions to run. Is part of the decision also related to the idempotency of the chown command?
I've been meaning to build something in Rust for a while, and if adding support for multi-core/parallel processes is cumbersome, I'd be happy to give it a shot. Let me know!
Right now the entire program is sequential, so it can't really take advantage of more than one proc.
It probably makes sense to make the chown calls concurrent and use the default number of GOMAXPROCS, which defaults to the number of cpus.
~There is also a more efficient walk function that was introduced in 1.16: https://pkg.go.dev/path/filepath#WalkDir~
There is also a more efficient walk function that was introduced in 1.16: https://pkg.go.dev/path/filepath#WalkDir
Turns out that function is more efficient because it doesn't call LStat on each file or directory walked. But that is the call that provides the uid/gid information, so we can't switch to WalkDir
There might be design space for the more efficient version. If there is a separate setting to force the uid/gui on a folder, recursively, regardless of its current ownership, then that code path can use the new walkdir. And it can also be useful in some situations.
For example, I have this container where a ton of files in one subfolder is first created by root in the upstream container (which I do not control), then downstream container adds a non-root user and then would have to chown that whole folder to the new user, but due to Docker not really having a way to just mark permission changes in the layer it causes a new layer with the whole folder (~500Mb) showing up as changed. Being able to tell fixuid "oh and that folder should also be owned by this user" would be of great benefit.
And some users that do not care about the start permissions of the files in some of their (larger) folders could use it as a performance improvement as well.
Could be called "force_paths" in the config file.
I don't think we want to expand the scope of this project past what it currently is. The chown functionality is mainly for switching ownership of the $HOME dir, but since many images also drop in user-owned dirs to places like /src or /app that necessitated recursing the entire FS by default.
While multi-process would help speed up a large amount of chown calls, it would also somewhat complicate the codebase. I'm not sure the change is worth it.
At the end of the day folks could probably write a script with find ... --print0 | parallel -0 chown ... for anything custom, put a setuid bit on that, then make it part of the entrypoint pipeline.
And some users that do not care about the start permissions of the files in some of their (larger) folders
paths can already do this - the user can specify only the directories that they want changed
pathscan already do this - the user can specify only the directories that they want changed
That's not what I meant. "paths" only changes file/folder permissions if the file is owned by the old uid/gid. What I am proposing is an option that would change ownership of all files in the specified folder, regardless of who was their owner before. That would also (very efficiently) solve the root issue of the original feature request without adding too much extra code - no extra processes or threads needed.
Also your proposed solution would not really work. For one because there would be a huge IPC bottleneck between find and parallel/chown and for other because setuid does not work on shell scripts. Which is why fixuid is a binary to being with.
What I am proposing is an option that would change ownership of all files in the specified folder
I can see how that would be valuable, but it's a slippery slope to expand the scope of this project.
The scope right now is to change up to 1 UID/GID and chown files/dirs to match that change and I'd like to keep it at that.
setuid does not work on shell scripts
Forgot that little detail :man_facepalming: - so would need a script + sudo in that case
Yes, adding functionality, especially to such tight component always needs a very good reason, preferably several. Which is why I am proposing instead of complex process things or other complications to add a more straightforward feature that does the same that fixuid already is doing, but in a more direct and efficient way, which could cover (at least) two use cases:
- handle large home/app folders faster (if complex ownership structures inside them are not important)
- not needing to pre-chown folders to the in-container uid/gid for pre-existing files (which adds large Docker layers)
And it could also be implemented in a couple lines just by calling the external chown binary