modules icon indicating copy to clipboard operation
modules copied to clipboard

add API for "package dir"

Open ljharb opened this issue 5 years ago • 72 comments

See https://github.com/nodejs/node/issues/33460.

tl;dr: it turns out that there's a number of tools that need to be able to locate a package's root directory, and with "exports", they can no longer rely on package/package.json working, nor package/.

If we could provide this API, then CJS could path.join it, and ESM could use URL, to get whatever file the tool needs to fs.readFile.

Adding this new API would bypass questions about "should package.json be an implicit part of a package's API", as well as avoid reliance on ESM-only or CJS-only mechanisms. By providing the package root dir rather than its package.json, we would not be encouraging or discouraging any patterns of "where to store metadata" - instead, we'd be correctly leaving that choice up to userland.

Example solution: module.packageDir(specifier) and module.packageDirSync(specifier) (sync access is critical for many use cases).

ljharb avatar May 19 '20 23:05 ljharb

How would the resolver source be determined? Maybe require.packageDir would be a better place for the API so could have access to the callers __filename? For example you have the following packages installed:

node_modules/pkg1
node_modules/pkg1/node_modules/pkg2
node_modules/pkg2

node_modules/pkg1/index.js asks for packageDir of pkg2, it needs to find the nested one.

coreyfarrell avatar May 19 '20 23:05 coreyfarrell

Yes, that is true. It would need to do the searching contextually, just like require.resolve, so require.packageDir would be a better place to put it (and thus perhaps, for ESM, import.meta.packageDir)

ljharb avatar May 19 '20 23:05 ljharb

As paths are URLs in ESM, "dir" is perhaps not the best name. But aside from naming, yes, this would be a good method to expose. In https://github.com/nodejs/node/issues/33460#issuecomment-630968792 I had suggested resolvePackageRoot.

GeoffreyBooth avatar May 19 '20 23:05 GeoffreyBooth

I'm removing the modules agenda label here, as the discussion around this seems to be the same topic as #33460 which is still on the agenda as well.

guybedford avatar Aug 10 '20 18:08 guybedford

Any plans to resolve this?

stevenvachon avatar Feb 28 '21 00:02 stevenvachon

I had completely forgotten about this, but am still very much +1. Here's a way of going about doing this that may be worth considering. The constructor takes the specifier, resolves it, and populates the object with everything it knows about the package.

import { readDirSync } from 'fs';

const tapePkg = new Package('tape'); // resolves the identifier
const pkgDir = tapePkg.packageDir; // access the directory


console.log(readDirSync(pkgDir)); // log the resolved directory contents

Some research on other language approaches: https://docs.microsoft.com/en-us/dotnet/api/system.io.packaging.package?view=netcore-3.1

DerekNonGeneric avatar Feb 28 '21 01:02 DerekNonGeneric

That seems a bit overkill.

All that I think is needed is Module.getPackageDir(packageSpecifier), and then anything needed from there can be done with fs APIs.

I haven't yet had time to make a PR, but anyone is welcome to beat me to it.

ljharb avatar Feb 28 '21 05:02 ljharb

Module.getPackageDir(packageSpecifier) would need a specifier to know where it is starting resolution.

I also am unclear on how any API should integrate with redirection mechanisms like policies and import maps. I'd personally be fine ignoring redirection mechanisms but that could lead to problems where importing doesn't match the API.

bmeck avatar Feb 28 '21 13:02 bmeck

We could add a method to the require function so it knows where to start resolution.

targos avatar Feb 28 '21 13:02 targos

Or a function that takes two parameters. getPackageDir(__filename, packageSpecifier) in CJS and getPackageDir(import.meta.url, packageSpecifier) in ESM.

targos avatar Feb 28 '21 13:02 targos

I’d indeed expect a second argument to the static method that took __dirname/import.meta.url. Dangling it off require wouldn’t work in ESM.

I’m not sure how it would interact with policies, but i don’t think there’s any security issues from retrieving a path - if filesystem access is restricted, then you’d be unable to use it, which policies could dictate?

ljharb avatar Feb 28 '21 14:02 ljharb

@ljharb I am unclear / don't see an issue at a glance, but the problem can be explained by example:

// redirected to file:///app/node_modules/apm/interceptions/sql.js instead of file:///app/node_modules/sql/index.js
import.meta.resolve('sql');
// likely want file:///app/node_modules/sql/ (trailing slash, yes) ? but that now doesn't match where it is actually getting loaded from
module.package('sql');

This would affect pretty much any custom resolver the more I think about it, but we tend to state for policies that it is the job of the policy creator to manually enforce things like fs access per callsite (by redirecting as desired per scope) and similarly we can just state they need to customize module as well. This is a problem with a single argument form if it isn't an absolute location (path or URL) since it cannot be customized per scope. Similarly, without an absolute location, it wouldn't know what to resolve against, it could use process.cwd() but then things in node_modules can get the wrong values.

bmeck avatar Feb 28 '21 14:02 bmeck

Makes perfect sense. Theoretically, this api in node could use the same resolver that would respect policies, and could return the directory that contains that package's package.json - then it wouldn't require extra work on behalf of the policy creator.

The double argument form, with source location, was always the plan; i just failed to mention the second argument in my above comment.

ljharb avatar Feb 28 '21 15:02 ljharb

@ljharb if it respects policies, are you saying it should return file:///app/node_modules/apm/? since that is the package for the resolved "sql"? Policies are not realistically human writable (same as any non-trivial import map) so I wouldn't worry too much about work on creating since it can be left to tooling. Policies could always add data for this method to use rather than trying to make it work off the actual resolved location.

bmeck avatar Feb 28 '21 15:02 bmeck

It should return whatever path foo would make fs.readFileSync(path.join(foo, 'package.json')) return the expected package.json contents (presuming fs access is available).

ljharb avatar Feb 28 '21 15:02 ljharb

@ljharb I'm asking which package.json is expected

bmeck avatar Feb 28 '21 15:02 bmeck

That of the sql package, in this case. If the apm wants to provide a fake package dir with a custom package.json it is certainly free to construct one.

ljharb avatar Feb 28 '21 15:02 ljharb

@ljharb ah, file:///app/node_modules/sql/ in the example above then; by default it wouldn't resolve using the result of the current data in policies and a new data point would be needed to intercept that (same for loading hooks).

bmeck avatar Feb 28 '21 15:02 bmeck

Are there any objections to the following API?

module.getPackageDir(specifier, baseDir)

DerekNonGeneric avatar Mar 11 '21 04:03 DerekNonGeneric

precise naming aside, that's the only API we've ever really discussed or expected.

ljharb avatar Mar 11 '21 04:03 ljharb

module.getPackageDir(specifier, baseDir)

My preference would be to call it getPackageRoot, but otherwise it's fine. In general most docs these days prefer “folder” to “directory,” and also “root” covers you in case this returns an URL (as I assume it would for ESM).

Is the base necessary? Can it be optional, and if omitted it's relative to the current file/module?

GeoffreyBooth avatar Mar 11 '21 07:03 GeoffreyBooth

I'm fine with "getPackageRoot".

Because the API isn't context-aware (since it's on module and has to be imported, and isn't provided freshly for each module), it has no way of knowing the current file/module - so a base is necessary.

From CJS, you'd pass __dirname, and from ESM, import.meta.url.

ljharb avatar Mar 11 '21 14:03 ljharb

Question: How would such a function know what the real package directory is? Theoretically, and practically, there could be a "mini" package.json that is there just for a type: module usage.

giltayar avatar Mar 11 '21 14:03 giltayar

@giltayar node's algorithm knows that, because it only looks at the real package directory's package.json for "exports" - and that's the precise knowledge that this API will expose.

ljharb avatar Mar 11 '21 14:03 ljharb

@ljharb I do not believe it recursively searches for a package containing "exports" I believe given our documentation it stops on the first package boundary not for a module located in node_modules/(@.../)?...

bmeck avatar Mar 11 '21 15:03 bmeck

Maybe I misunderstood the last comment since the resolver wouldn't be recursing up to find the source of a deep import's package boundary with this API but instead resolving the deep import and returning the package.json which performed the last step of the resolve? Maybe we could name this API more clearly.

bmeck avatar Mar 11 '21 15:03 bmeck

It’s very explicitly intended to recurse up and find the package boundary; it’s not particularly useful to know the package.json for a given file but it’s necessary to be able to find the package boundary itself.

A different name is fine; it’s the semantic that’s important.

ljharb avatar Mar 11 '21 15:03 ljharb

@ljharb the nearest package boundary stops at the first package.json (recursing parent directories) but "exports" check is only during node_modules traversal (only recurses to find the module by a name). I'm now confused on the desired semantics.

bmeck avatar Mar 11 '21 15:03 bmeck

The problem is that when a package has “exports” (ie, only when it’s in node_modules) there’s no reliable way to get to the package.json file that has “exports” in it. This api is intended only to solve that problem.

it would be fine, for example, if it rejected when the specifier passed to it wasn’t a package name that’s in node_modules (even on a deep file path) since the only important goal is to get the “exports” field’s package.json’s directory - the package root.

ljharb avatar Mar 11 '21 15:03 ljharb

that seems fine, my complaint is we don't define a "package root" currently. as @giltayar points out you can have multiple package boundaries:

/app
/app/node_modules/logger/package.json - package boundary at /app/node_modules/logger
/app/node_modules/logger/esm/package.json - package boundary at /app/node_modules/logger/esm , in theory can contain "exports"
/app/node_modules/logger/cjs/package.json - package boundary at /app/node_modules/logger/cjs , in theory can contain "exports"

if /app/main.js wishes to know which package boundary performed the "exports" resolution of logger/logger.js we get into the confusion.

It sounds like you are trying to define "package root" to be unrelated to the actual filesystem from a traversal after resolution point of view. You want to get /app/node_modules/logger/package.json. This is a short circuiting instead of post resolution recursion. I am not really sure "package root" is a clear term so I think the "exporting package" might be better. For example if we add a redirect:

/app/node_modules/cjs-logger/ -> /app/node_modules/logger/cjs/

Would we expect using the API on cjs-logger/logger.js to return /app/node_modules/logger/cjs/package.json? It seems so from what I'm reading in the comment above. Stating the "right" one isn't really clear and it seems "the one with exports" doesn't make sense if multiple may contain exports. The one which had its "exports" used however is unique and non-reentrant.

bmeck avatar Mar 11 '21 16:03 bmeck