Docker user should not be root
This makes it otherwise impossible to use NFS on Wikimedia VPS as cache. On the top of this, this is a security recommendation.
You shall now create one such issue in every scraper 😬
@rgaudin yes... one problem at a time.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
Easy to fix with this doc https://medium.com/better-programming/running-a-container-with-a-non-root-user-e35830d1f42a
@kelson42, this requires a strategy and coordination with the zimfarm to work because we share common resources across various containers.
In a zimfarm worker, we have, a root-folder, let's say /data on the host.
- Manager needs
rwaccess to this to ensure it's writable so we fail early if not.roto calculate available disk space
- Task Worker needs:
rwaccess to create its per-task workdirrwaccess to remove it on cleanuprwaccess to find and then remove ZIM filesroto calculate available disk space
- Scraper needs
rwaccess to store temp data and ZIM files.
In order for this to work we can either:
- use root everywhere and not care about permission (current behavior)
- use non-root, set
777permission on root dir and require all future files and folder to stick to everybody permissions. Seems unrealistic and fragile. - Use non-root on a coordinated
uid.
This last option would mean hard-coding an arbitrary yet identical uid for the user to run inside all of the scrapers's dockerfile/entrypoint. We'd need each to be good citizen in not accidentally doing stuff as root (if using gosu).
There would be implications on the standalone (non-zimfarm) use of the scrapers' images as this arbitrary user might not have correct permissions on destination.
Note: we're using mounts on the host but using docker volumes would have the same constraints.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
We wouldn't necessarily have to hard code it but could set the PUID & PGID with environment variables. See https://docs.linuxserver.io/general/understanding-puid-and-pgid. I still don't know how this works but it seems like it's set in their base image initd https://github.com/linuxserver/docker-baseimage-ubuntu/blob/bionic/root/etc/cont-init.d/10-adduser#L3
You can also use docker run --user and specify either an uid or a username (that exists inside the container) and that would do it. Probably a good approach for our use case.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.