rfcs
rfcs copied to clipboard
[RRFC] Clean up file ownership story
We currently have an entire category of issue reports around unexpected changing of file ownership. We need a place to start the discussion on what we can do to improve the story around file ownership management: Maybe we need less of it these days?
Opening this RRFC so that we can have a place to share thoughts, link related issues to and hopefully eventually evolve towards an RFC if we come up to the decision that changes are necessary.
cc @nlf @fritzy
Auditing and aligning this sounds great.
I would expect that npm has full ownership (and thus, full freedom to change whatever it likes) of any node_modules
, and $HOME/.cache
, and things inside npm root -g
, and its own installation folder.
Things a user is explicitly asking npm to change - .npmrc
, package.json
, lockfiles - npm of course can change, but shouldn't change ownership (it should error, if that is required; warn if not)
I would also expect that any other file is something npm is not "allowed" to change in any way.
AFAIK the existing file ownership changes are so that things like sudo npm install
in a directory that's owned by your user doesn't write a bunch of files owned by root, both to your project and potentially to your cache.
This doesn't feel like something we should be attempting to forcibly fix on a user's behalf. The stat calls to determine ownership, and the chown calls to modify the ownership when it differs from what we infer are not free. We are most certainly taking a performance hit with these, especially in environments like Docker where filesystem i/o can be significantly slower.
Attempting to change permissions of things like the directory global bin scripts are located in can be absolutely disastrous.
Is there prior art for the type of permissions changes we attempt? I suspect there's not, but personally I also pretty strongly believe we shouldn't be attempting to change ownership ourselves.
Is it possible to make sudo npm install
immediately exit with an error?
potentially, but that wouldn't stop things like a user elevating their permissions to root manually and installing there. we absolutely can't explode when npm is run as root unconditionally, as that's often the case in a containerized environment.
a suggestion: what if npm doctor
can help diagnose this class of problem? i.e. if some other npm command errors with something related to filesystem permissions, our error output can suggest the user runs npm doctor
. that would then do things like examine the permissions of files in node_modules
and the cache directory and either link to docs or display in-line how the user can correct the permissions themselves
after actually running npm doctor
myself for the first time in quite a while, it already does permissions checks
Perms check on cached files ok
Perms check on local node_modules ok
Perms check on global node_modules ok
Perms check on local bin folder ok
Perms check on global bin folder ok
so all that's missing is making our error output suggest running doctor, and doctor telling you how to fix the problem
We could have a --no-root-user
default for non-global installs, and then ci users would have to run --root-user
. In the same release, we'd drop all ownership checks, and just write files as the current user. This would, of course, be a major breaking change.
ENOROOT: running `npm` as root is discouraged. Use `--root-user`to override. [link to docs]
adding this as another data point https://github.com/npm/cli/issues/4312
Rather than dropping perms to nobody in this circumstance (which breaks compilation or yields surprising ownership issues), can we drop perms to $SUDO_USER instead?
Sent from my iPhone
On Mar 3, 2022, at 19:08, Nathan Fritz @.***> wrote:
We could have a --no-root-user default for non-global installs, and then ci users would have to run --root-user. In the same release, we'd drop all ownership checks, and just write files as the current user. This would, of course, be a major breaking change.
ENOROOT: running npm
as root is discouraged. Use --root-user
to
override. [link to docs]
— Reply to this email directly, view it on GitHub https://github.com/npm/rfcs/issues/546#issuecomment-1058686129, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABTIUCYLNTMJ6UH4VVY2R3U6FH7LANCNFSM5P3WILHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
Rather than dropping perms to nobody in this circumstance (which breaks compilation or yields surprising ownership issues), can we drop perms to $SUDO_USER instead?
I think that's not a good idea, not everyone is using sudo. If I ssh root@myserver
this var won't be set because I'm "really" root. This is not a good practice, but it's the default on many hosters
Adding two more references :
- https://github.com/npm/cli/issues/4010
- https://github.com/npm/cli/issues/3705
how about this for a default behavior:
flowchart TD
A[root runs command] --> B{SUDO_UID/SUDO_GID are set?};
B -- Yes --> C[setuid/setgid to SUDO_UID/SUDO_GID] --> E;
B -- No --> D{--allow-root passed?};
D -- Yes --> Z[log warning] --> E[proceed running command];
D -- No --> F[log error, exit];
when logging an error and exiting, we could use a message along the lines of
npm ERR Running npm commands as root is considered dangerous.
To run as a different user, you may provide the '--uid' and '--gid' flags.
If you're absolutely sure you want to run this command as root use the '--allow-root' flag.
For more information: https://some/link/to/the/docs
and the warning could read something like
npm WARN Running npm as root. For more information: https://some/link/to/the/docs
@nlf You didn't wrote it, but I assume that you are suggesting the above default behavior + removing npm chmod ?
If that's the case it seems pretty nice ! It will break a lot of misconfigured docker containers but I think this is not a problem since no use application should be run as root, even in a container.
People will discover the USER docker directive and that's for the best :)
IMO a flag would be a simple yet effective solution to this problem. I'm thinking about something like --no-infer
flag that simply skip the inferOwner
call here
Leaving this on the agenda for today, @nlf would be great if you could speak bit to your diagram/approach discussed here in the call but understand if you can't make it.
I'm extremely frustrated with this permission decision. Docker volume mounts use the UID:GID of the host
machine. The changes npm has made to infer the execution user from the UID:GID effectively break docker setups where the host users UID:GID does not match the node user on the container. Setting UID:GID in a .env file for each developer in our application is cumbersome and overall ridiculous. Stop trying to infer best security practices when it comes to running scripts, your job is to be a package manager
I'm extremely frustrated with this permission decision. Docker volume mounts use the UID:GID of the
host
machine. The changes npm has made to infer the execution user from the UID:GID effectively break docker setups where the host users UID:GID does not match the node user on the container. Setting UID:GID in a .env file for each developer in our application is cumbersome and overall ridiculous. Stop trying to infer best security practices when it comes to running scripts, your job is to be apackage manager
to be very clear this permissions decision was made many years ago, it is by no means a new thing, and that's exactly why this issue exists. this is something that we know to cause problems, so we're trying to determine the best way to fix that without immediately causing builds to break for people who have worked around the problem already. we'll be starting to roll out some changes to help folks in your situation in the near future.
Actions
- [ ] @nlf / @ruyadorno to make an RFC for these changes
Hey all,
having spent a whole day debugging an issue caused by silently setting the effective UID of npm subprocesses to <owner of the current directory>
yesterday I'd like to propose an alternative to this behavior:
we absolutely can't explode when npm is run as root unconditionally, as that's often the case in a containerized environment
That's fair, but how about exploding when npm is run as root and the current directory is owned by non-root? This seems strictly preferable to silently switching users.
I also agree that silently switching users is a great source of pain. I for example lost about one hour because of this. Please find a solution that stops doing this, thank you all very much :)
I think npm shouldn't be looking at its own UID, it should not modify file ownerships and permissions, and it should specially not switch users after it's called. That last point, when run as root, breaks long standing assumptions of a process behavior.
It's up to the caller of npm to decide what user they want to run it as, even if their decision is bad.
If I run npm install
as root, npm will create files in node_modules and in my cache that belong to root. That is not a bug in npm, and it's not something for npm to fix. If I don't want to have those files owned by root, then I shouldn't be running npm install
as root to begin with.
@javabudd We face(d) a similar issue.
I'm extremely frustrated with this permission decision. Docker volume mounts use the UID:GID of the
host
machine. The changes npm has made to infer the execution user from the UID:GID effectively break docker setups where the host users UID:GID does not match the node user on the container. Setting UID:GID in a .env file for each developer in our application is cumbersome and overall ridiculous. Stop trying to infer best security practices when it comes to running scripts, your job is to be apackage manager
echo "module.exports.sync = function() { return { uid: 0, gid: 0 } };" >> $(npm root -g)/npm/node_modules/infer-owner/index.js
This is the workaround we cooked up on a ubuntu based docker (so your path may be different)
It modifies the behavior of how npm detects which user it should run as. We this issue when using webpack in a VOLUME mount, while writing the output to a another, in-container only folder owned by root. In our case, forcing UID=0 is enough, so we echo append that to the module.
(We are still evaluating this solution, but looks workable)
Still unclear, why NPM tries to lchown files that have nothing to do with Node. In my case, it is a link in /usr/local/bin to Microsoft Defender Antivirus. Please fix it.
% sudo npm install -g [email protected] Password: npm ERR! code EPERM npm ERR! syscall lchown npm ERR! path /usr/local/bin/mdatp npm ERR! errno -1 npm ERR! Error: EPERM: operation not permitted, lchown '/usr/local/bin/mdatp' npm ERR! [Error: EPERM: operation not permitted, lchown '/usr/local/bin/mdatp'] { npm ERR! errno: -1, npm ERR! code: 'EPERM', npm ERR! syscall: 'lchown', npm ERR! path: '/usr/local/bin/mdatp' npm ERR! }
% ls -l /usr/local/bin/mdatp lrwxr-xr-x 1 root admin 78 Aug 23 13:37 /usr/local/bin/mdatp -> /Applications/Microsoft Defender.app/Contents/Resources/Tools/wdavdaemonclient
From what I can see, it goes back to the times when it was suggested that npm
should be run as root
. The idea was that then npm
can switch to nobody
when running package scripts. A couple more of relevant links.
Was that a good idea or not?.. I personally think that npm
shouldn't decide under which user to run whatever a user wants to run (or at least provide an escape hatch, at some point it existed). That makes npm
more complex, its behavior more nuanced, and it's not npm
's responsibility (npm
is not in a position to do that).
I was beaten by it a couple of times, and it always took me a while to figure out what's going on. Today it was like this (in a docker
container):
$ npx serve
sh: serve: Permission denied
sh
? Permission denied?
If that's the case it seems pretty nice ! It will break a lot of misconfigured docker containers but I think this is not a problem since no use application should be run as root, even in a container.
People will discover the USER docker directive and that's for the best :)
So let's say I have a docker-compose
project that bind mounts .
into a container (to sync what's on the host and what's inside a container). And to make it work I create a user in Dockerfile
with UID 1000 (which happens to be my UID on the host), give it to my colleague (which has UID 1001 on his/her host), and now what? I have to make people specify their UID to run a project? Doesn't it look like npm
takes too much responsibility only to make peoples' lives harder? Compare it to the case where npm
doesn't switch users and I have nothing to do on my part.
how about exploding when npm is run as root and the current directory is owned by non-root? This seems strictly preferable to silently switching users.
see the above case
echo "module.exports.sync = function() { return { uid: 0, gid: 0 } };" >> $(npm root -g)/npm/node_modules/infer-owner/index.js
That's a rather radical way to deal with it, but in a way I like it :)
we're trying to determine the best way to fix that without immediately causing builds to break for people who have worked around the problem already
Do you have an idea how to work around it? (Or you're just trying to make it as close to the previous behavior as possible?) After dropping unsafe-perm
the only way is probably the one above. So, if we do it quick, noone will notice :) Or why not just drop the whole infer-owner/switch-users thing in the next release? I bet the builds that'll break... that'll make people happy that one workaround less is needed.
Or why not just drop the whole infer-owner/switch-users thing in the next release?
This is going to be the behavior in npm 9. Wednesday's prerelease should have this available for testing.
we have stopped altering file ownership at runtime on the user's behalf in npm@9 entirely.
i think we have some follow up work to do here to try to help users not end up in a situation where they have files unexpectedly owned by root. things like disabling the cache and log files if the cache directory is not owned by the same user that's running npm would go a long way towards not making users have to keep running chown -R
to fix ownership
Thanks for implementing this long overdue change. At last npm is more predictable if ran from root.
However, this is caused us a lot of pain. :) Indeed, our docker builds started failing as soon as we started using npm v9. The official Node documentation does not work any more in some scenarios: https://nodejs.org/en/docs/guides/nodejs-docker-webapp/
For others, who stumbled upon Docker build errors like
npm WARN tar TAR_ENTRY_ERROR EINVAL: invalid argument, fchown
or (for multi stage builds)
failed to copy files: failed to copy directory: Error processing tar file(exit status 1): Container ID 166037 cannot be mapped to a host ID
here is how I fixed it.
The build container should start from this:
FROM node:18 as builder
# Where all the commands are going to be executed
WORKDIR /usr/src/app
# Create user `node`
USER node
# The npm v9 does not automatically chown directofy for us, so we have to do it ourselves
USER root
RUN chown -R node /usr/src/app
# This would run any following npm commands NOT from root. It is highly recommended to never run npm as the root user.
USER node
# Copy all the files from your host (laptop?) current directory (except .dockerignore file matches)
COPY . .
RUN npm ci
And end with this:
# This will make sure the the next COPY --from=build docker command can be successfully executed
USER root
RUN chown -R root:root /usr/src/app
The next container in the multi-stage docker build should be like this:
FROM node:18-alpine
WORKDIR /usr/src/app
# ... other commands go here ...
# Copy files from previous build stage to this (last) build stage
COPY --from=builder /usr/src/app .
# Run the process under this user (because it is root by default)
USER node
Hope this helps someone in the future.
@koresar Copying files from Debian to Alpine? That won't work if any of your packages contain anything binary.
USER node
doesn't create a user, it switches to a user if it exists. Try USER non-existent-user
.
A lot of comments (Clean Code, chapter 4) but the really unclear part is uncommented:
# This will make sure the the next COPY --from=build docker command can be successfully executed
USER root
RUN chown -R root:root /usr/src/app
That is, there's a comment but it doesn't tell why one should avoid doing that. From what I can see, there should be no problem with copying files owned by non-root
. And the owner of the destination files is determined on the receiving end.
The way I'd do it (a gist):
FROM node:18-alpine as node_modules
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm install
FROM node:18-alpine
WORKDIR /app
RUN chown node: .
COPY --chown=node --from=node_modules /app/node_modules node_modules
COPY --chown=node . .
USER node
But okay. That's npm@8
. It however still works with npm@9
:
FROM alpine:3.17 AS node_modules
RUN apk add nodejs npm
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm install
FROM alpine:3.17
RUN apk add nodejs shadow && useradd node
WORKDIR /app
RUN chown node: .
COPY --chown=node --from=node_modules /app/node_modules node_modules
COPY --chown=node . .
USER node
All in all, I don't rule out that I'm missing something here, but the way it looks to me at the moment, in addition to the first issue (Debian -> Alpine), there's a lot of unnecessary movement in your solution. And mine is a better starting point. But if I'm missing something, can you possibly tell under which conditions it'll fail? How to reproduce the issues you mentioned? Imagine a person running into them. Should he/she change the whole file to fix them? Instead of "add this line," "change this line", or "don't do this because..."?
For past 9 years of using Dockerfile(s) I thought USER node
would crate me a user and a group.
I tried USER nonexistentuser
as you suggested. And it worked flawlessly. Did you try it yourself? :)
My base image is node:18
(plus its alpine version) because it's the one recommended by OpenJS Foundation. I'm not going to bake my own images with apk
and whatnot. Too laborious.
Thank you for your COPY --chown=node
tip. That's handy.
Let's not drive away the discussion in this issue from the npm
to my coding challenges.
I appreciate your help.
@koresar
I tried USER nonexistentuser as you suggested. And it worked flawlessly. Did you try it yourself? :)
FROM node:18-alpine
USER nonexistentuser
RUN set -x && whoami && id
$ docker build .
...
Step 3/3 : RUN set -x && whoami && id
---> Running in 5eb31625a61b
unable to find user nonexistentuser: no matching entries in passwd file
Also, from the official docs:
The
USER
instruction sets the user name (or UID) and optionally the user group (or GID) to use as the default user and group for the remainder of the current stage.
--
My base image is
node:18
(plus its alpine version) because it's the one recommended by OpenJS Foundation. I'm not going to bake my own images withapk
and whatnot. Too laborious.
As you probably noticed, my main suggestion was to use node:18-alpine
. But apparently my node:18-alpine
image had npm@8
, and I didn't think to try anything more specific (like node:18.14.0-alpine
), so for the sake of the argument I used alpine:3.17
. But generally yes, stick to node:...
until you can't.
Let's not drive away the discussion in this issue from the npm to my coding challenges.
Coding habits probably? Actually, I was planning to give a more detailed reply back than, but then decided to focus on what I thought was important. Also, I think it's often hard to draw the line between what one thinks is important and what is considered important. So everything here should be taken as my personal opinion, which you might share or not.
One more thing that came to mind. You said, "For those who stumbled into the following issues... Here's how I fixed it: ..." And then you provide what looks like a Dockerfile
to start from. Not a diff.
The point being, if you wanted to help people who stumbled into specific issues, you should have told what you changed. Like, "I had this line here, and after I changed it to this, it resolved." Ideally, you should have provided "steps to reproduce the issues," or something along those lines.
If you wanted to provide a Dockerfile
to start with (a starting-point Dockerfile
), then that is a different story.
And to me it looks like you more or less failed at both (no offense meant). The solution is not clear (what exactly solves the issues). And it's not clear what's missing in my version of Dockerfile
. But well, it can probably provide some ideas to both ends.