node icon indicating copy to clipboard operation
node copied to clipboard

Look for node_ceiling

Open arhart opened this issue 1 year ago ā€¢ 29 comments

It is easy to accidentally allow another user to influence what code node loads and executes. Details can be found at HackerOne reports 1564437 (CommonJS module loading), 1564444 (ECMAScript module resolution), and 1564445 (package.json). While these behaviors are documented, the security implications are easy to overlook.

By stopping the search for node_modules and package.json after encountering a node_ceiling file, this change allows an application to be protected with few changes to the file system and a high degree of compatibility.

arhart avatar Jul 13 '22 20:07 arhart

Review requested:

  • [ ] @nodejs/modules

nodejs-github-bot avatar Jul 13 '22 20:07 nodejs-github-bot

Some tests are failing, even before my changes.

arhart avatar Jul 13 '22 20:07 arhart

Refs: https://github.com/nodejs/node/issues/43192 Refs: https://github.com/nodejs/node/issues/43368

arhart avatar Jul 13 '22 20:07 arhart

This seems exceedingly premature prior to a lot more discussion.

Changing the CJS module resolution algorithm - especially with "Stability 2" - will impact a ton of tools in the ecosystem, including bundlers, transpilers, packages like resolve and enhanced-resolve, etc. Such a change should have extensive discussion with stakeholders, and broadcasting, long before writing any code.

ljharb avatar Jul 13 '22 20:07 ljharb

@ljharb there has been some discussion in the referenced issue, but I hoped having code to play with might facilitate additional discussion.

arhart avatar Jul 13 '22 20:07 arhart

and that discussion largely seems to indicate that this change shouldn't be made, except via loaders.

ljharb avatar Jul 13 '22 20:07 ljharb

Even loaders somewhat want something like this outside loaders. I like the use case and it is possible to do this with policies today to error outside a directory but policies lack the ability to stop searching at a directory boundary. It would be good to differentiate error vs short circuit here.

On Wed, Jul 13, 2022, 3:47 PM Jordan Harband @.***> wrote:

and that discussion largely seems to indicate that this change shouldn't be made, except via loaders.

ā€” Reply to this email directly, view it on GitHub https://github.com/nodejs/node/pull/43828#issuecomment-1183662413, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZJIZ2VHSAFAAMTWHAABDVT4TNJANCNFSM53P7D6ZQ . You are receiving this because you are on a team that was mentioned.Message ID: @.***>

bmeck avatar Jul 13 '22 21:07 bmeck

Did you consider just using an environment variable for this like NODE_PROJECT_ROOT or something?

guybedford avatar Jul 13 '22 22:07 guybedford

An env would be nicer as it cannot be removed from disk to circumvent things.

On Wed, Jul 13, 2022, 5:36 PM Guy Bedford @.***> wrote:

Did you consider just using an environment variable for this like NODE_PROJECT_ROOT or something?

ā€” Reply to this email directly, view it on GitHub https://github.com/nodejs/node/pull/43828#issuecomment-1183749511, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZJI5A3LMZG6ZWHJR52ALVT5AG7ANCNFSM53P7D6ZQ . You are receiving this because you are on a team that was mentioned.Message ID: @.***>

bmeck avatar Jul 13 '22 23:07 bmeck

@guybedford @bmeck When considering policies for issue 43192 , I considered command line arguments, environment variables, and a file-based indicator. Some projects, such as NPM, spawn child processes which either directly or transitively exec additional node processes. Passing an extra command line argument potentially requires modifying code across projects. Passing an environment variable (such as NODE_OPTIONS) reaches some invocations that command line arguments can't easily reach, although some programs intentionally strip out environment variables. Both environment variables and command line arguments

  1. rely on intermediate code passing them along as intended (there is potentially a LOT of intermediate code to consider)
  2. are difficult to observe, especially for short-lived processes
  3. are difficult to ensure don't accidentally stop behaving as intended, across dependency upgrades, for example

The using something in the file system OTOH

  1. needn't rely on intermediate code leaving unchanged if the process doesn't have permissions to modify the relevant files
  2. is easy to observe statically as part of an RPM, tarball, ZIP, or other package used to install the program or directly on the filesystem after install and/or during development
  3. can be protected from accidental or intentional modification using normal system tools (chmod, chown, mount options, etc.)
  4. if structured to match the existing search paths for node_modules and package.json, can follow the same rules used for existing resolution, allowing for different search termination for different files (just like different files today can have different package.json files and/or walk a different path toward the root due to symlinks).
  5. could be adopted by system-wide install locations, protecting common scripts like npm, mocha, node-gyp, etc. Unlike environment variables or command-line options which would seem to require some sort of merge-algorithm to handle dependencies installed in multiple locations (main app, system-wide, user home directory, ...), a file-system approach has a built-in way to distribute the information that applies to the different install locations.

I also considered placing something inside package.json or inside node_modules, but it seemed useful to be able to terminate the search for either without the presence of the other. Also, the search for package.json terminates at the first one located, while the search for node_modules can continue beyond a package.json, so a new file seemed useful.

I chose to look specifically for a file rather than a file-or-directory because it seemed useful to have a single pattern. Either would allow future expansion (file contents or new entries in a directory), but having a mix of both sounded needlessly complicated.

The name seemed reasonably meaningful, didn't commit to a file-format when it's currently only a present or not-present boolean, and is the same length as node_modules (so it's probably OK on any system node currently searches for node_modules).

arhart avatar Jul 14 '22 20:07 arhart

Would it be useful to have an environment variable in addition to looking in the file system? If so, there would be questions of how multiple paths are joined into the variable and how they are normalized (or not) and compared with the search paths, but I think both approaches could coexist.

arhart avatar Jul 15 '22 02:07 arhart

Environment variables have the problem that they may not get propagated to child processes. Ex. child_process.fork().

bnoordhuis avatar Jul 17 '22 11:07 bnoordhuis

If policies were automatically looked up it would solve the problem as well. They intentionally do not compose though. If we go for files I would prefer that approach as it is fewer features to keep track of.

On Sun, Jul 17, 2022, 6:16 AM Ben Noordhuis @.***> wrote:

Environment variables have the problem that they may not get propagated to child processes. Ex. child_process.fork().

ā€” Reply to this email directly, view it on GitHub https://github.com/nodejs/node/pull/43828#issuecomment-1186486253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZJI7PH5H7GZMGAYADUA3VUPTPPANCNFSM53P7D6ZQ . You are receiving this because you were mentioned.Message ID: @.***>

bmeck avatar Jul 17 '22 12:07 bmeck

If policies were automatically looked up it would solve the problem as well. They intentionally do not compose though. If we go for files I would prefer that approach as it is fewer features to keep track of.

Policies offer options like dependency redirection. Any proposal would need to avoid automatically loading a policy under malicious control. Are there any proposals in that direction?

The code in this pull request only prunes the search path and only looks in directories that have already been examined for node_modules or package.json.

arhart avatar Jul 17 '22 13:07 arhart

Policies have their own integrity flag provided via CLI.

On Sun, Jul 17, 2022, 8:37 AM Andrew Hart @.***> wrote:

If policies were automatically looked up it would solve the problem as well. They intentionally do not compose though. If we go for files I would prefer that approach as it is fewer features to keep track of.

Policies offer options like dependency redirection. Any proposal would need to be avoid automatically loading a policy under malicious control. Are there any proposals in that direction?

The code in this pull request only prunes the search path and only looks in directories that have already been examined.

ā€” Reply to this email directly, view it on GitHub https://github.com/nodejs/node/pull/43828#issuecomment-1186520871, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABZJI56MKKT7XAUGICLSX3VUQEB3ANCNFSM53P7D6ZQ . You are receiving this because you were mentioned.Message ID: @.***>

bmeck avatar Jul 17 '22 16:07 bmeck

Like environment variables, CLI arguments don't necessarily get passed to child processes.

arhart avatar Jul 17 '22 16:07 arhart

I don't have any strong opinions here but if we do end up using a file sentinel I'd suggest package.json with "root": true, like how eslint works.

devsnek avatar Jul 17 '22 16:07 devsnek

I like that idea much better - it mirrors private: true, it's in the place node already has to look, and the root of a project is almost certain to already have a package.json.

ljharb avatar Jul 17 '22 16:07 ljharb

Using a field in package.json might not be as convenient as it first sounds.

  1. Libraries often ship the same package.json that is used during development, but using a field in package.json might prevent that.
    1. During development it might be desired to stop searching for node_modules at its package.json location, but when the library is installed that might not be appropriate (depending how which package manager with which options installed it).
  2. The desired location might not be somewhere node already looks for package.json
    1. Node currently stops looking for a package.json once it has found one.
    2. Consider a pnpm install of express in a project called topproject. Express might be found in topproject/node_modules/express as a symlink to .pnpm/[email protected]/node_modules/express. It might find all of its dependencies under topproject/node_modules/.pnpm/[email protected]/node_modules/* and want to stop searching for node_modules at topproject/node_modules/.pnpm/[email protected]/, which doesn't currently have a package.json (but it could be added; so could a node_ceiling). It seems unlikely node would currently look for a package.json there because it is likely all the files further under [email protected]/ already have a nearer package.json. It seems it might be desirable to stop there rather than continuing to topproject/node_modules, for example to prevent packages that are direct dependencies of topproject from being found when required-but-not-declared by other (possibly transitive) dependencies of topproject. I could imagine pnpm might want to support both use cases of stopping at topproject/node_modules/.pnpm/[email protected]/ or at topproject/.
    3. Node could be made to look for package.json there specifically to look for a "root": true, but would that cause more confusion than a distinct filename for that purpose?

arhart avatar Jul 22 '22 15:07 arhart

This feature comes down to specifying a single path only within which Node.js should execute code - I think rather than a file signifier, we should just have an out-of-band way to achieve that such as a flag or environment variable or configuration file.

There are a number of things in Node.js at this point that want to just be applied ambiently through configuration - loaders, hooks, instrumentation, policies, custom flags. Perhaps this is another data point on the journey to having a generic configuration file for Node.js invocations that is looked up on execution. If such a thing were to be designed, having a key for this configuration seems like a good solution to me.

So starting from a flag / environment variable I think we can get there. But I think this PR has a better chance of landing if it were to take this approach rather than introduce a new pattern which provides another statting reliance and pattern for people to learn.

guybedford avatar Jul 22 '22 21:07 guybedford

@guybedford could you elaborate on your single path idea? With Node.js looking in so many paths for dependencies, especially with symlink handling, perhaps you have a simplified or recommended pattern in mind?

arhart avatar Jul 23 '22 12:07 arhart

@arhart sure, if you're after an explicit suggestion I can certainly provide that. My simplified or recommended pattern for implementation would be:

node --ceiling-path="./app" --ceiling-path="./"

Possibly alternatively supporting some ; or : separator form for a combined flag:

node --ceiling-paths=".;app"

In due course then extending support for such a flag through project-level / app-level configuration system. Name can be bikeshed further of course.

This is just my preference, it's an open process and you have no explicit blocks, so it is your decision how to navigate further of course.

guybedford avatar Jul 23 '22 23:07 guybedford

@guybedford Thanks. I think I see the direction you are suggesting.

While some sort of mechanism for ambient configuration of Node.js seems useful, it also seems like a long route to walk to get there.

arhart avatar Aug 09 '22 22:08 arhart

I wouldn't characterize this feature as specifying path(s) within which Node.js should execute code. Instead, I'd say it's about stopping module resolution at particular path(s).

If an application dynamically downloads and loads a module, then it seems good for the path(s) [where] module resolution stops to be similarly flexible.

arhart avatar Aug 09 '22 22:08 arhart

An install tool (ex: npm) knows where it is placing modules and their dependencies. It might not have visibility to the larger application structure or even how many applications use those modules.

arhart avatar Aug 09 '22 23:08 arhart

Iā€™m very hesitant to modify the CommonJS resolution algorithm at this point. There are so many other tools that depend on it and reimplement it, like many of the bundlers out there. I would encourage you to implement this idea as a loader, at least at first; that can prove it its viability, and then you can make the argument that it should be part of Node core.

GeoffreyBooth avatar Aug 09 '22 23:08 GeoffreyBooth

@GeoffreyBooth is that a general hesitancy to change something so central or are there also specific reasons to think such a change would not be compatible with existing code?

Implementing as a loader could be interesting, at least the parts that loaders currently allow. My understanding is that loaders can only be setup by command line arguments and can't change how CommonJS modules are loaded. What do you think might be learned by having a partial loader implementation rather than the more complete branch in this pull request?

arhart avatar Aug 09 '22 23:08 arhart

@GeoffreyBooth is that a general hesitancy to change something so central or are there also specific reasons to think such a change would not be compatible with existing code?

Changes to the resolution algorithm impose a heavy burden on the community. Many tools, especially build tools, reimplement the resolution algorithm and therefore would need to be updated to accommodate this change. Before we impose that cost on everyone, Iā€™d like some indication that anyone beyond you would use this feature. Shipping it as a loader (using the ESM hooks for modifying ESM resolution, and monkey-patching require to update the CommonJS resolution) and seeing it find adoption would prove that this is something that should become part of core, and that all the other affected tools are updating for a worthwhile reason.

GeoffreyBooth avatar Aug 09 '22 23:08 GeoffreyBooth

I wouldn't characterize this feature as specifying path(s) within which Node.js should execute code. Instead, I'd say it's about stopping module resolution at particular path(s).

In both forms, it is and always is just a list of paths, just with different sources of truth over which paths constitute stop paths. Roundabout paths to goals are unfortunately usually necessary!

guybedford avatar Aug 10 '22 03:08 guybedford

Hi :wave: thanks for considering a feature like this! Some package managers like yarn LTS (3.x) allow an install config to define the ceiling during installs (which should override the hoisting limits but I'm not sure if it's for ESM loading or just CommonJS?).

So, could Node also look at those config options in package.json when it does it's module resolution? I think if someone has a file tree like:

app.js
node_modules/ <-- contains lib/v1.js
workspace_a/
  - node_modules/  <-- contains lib/v2.js
  - file1.js
  - package.json

And inside workspace_a's package.json, is:

{
  ...
  "nmHoistingLimits": "workspace",
  ...
}

I would want this to mean: when I do import * from 'lib', I would want to not search the parent node_modules, only the lib in the workspace.

For the time being, I have to use the resolve hooks:

import { resolve as pResolve } from 'path';

const parentDir = pResolve('..');
const ignoreDir = `file://${parentDir}/node_modules`;

export async function resolve(specifier, context, nextResolve) {
  let localPkg
  if (specifier.startsWith(ignoreDir)) { // If the dep is tin the parent...
    const pkg = specifier.match(/node_modules\/([^\/]+)\//)[1] // pull out the pkg name...
    localPkg = await import.meta.resolve(pkg).catch(() => {}) // and only use the local one, not the parent's
  }
  return nextResolve(localPkg || specifier); // return the local one if found, otherwise carry on...
}

richardeschloss avatar Sep 16 '22 16:09 richardeschloss