modules
modules copied to clipboard
add API for "package dir"
See https://github.com/nodejs/node/issues/33460.
tl;dr: it turns out that there's a number of tools that need to be able to locate a package's root directory, and with "exports", they can no longer rely on package/package.json working, nor package/.
If we could provide this API, then CJS could path.join it, and ESM could use URL, to get whatever file the tool needs to fs.readFile.
Adding this new API would bypass questions about "should package.json be an implicit part of a package's API", as well as avoid reliance on ESM-only or CJS-only mechanisms. By providing the package root dir rather than its package.json, we would not be encouraging or discouraging any patterns of "where to store metadata" - instead, we'd be correctly leaving that choice up to userland.
Example solution: module.packageDir(specifier) and module.packageDirSync(specifier) (sync access is critical for many use cases).
How would the resolver source be determined? Maybe require.packageDir would be a better place for the API so could have access to the callers __filename? For example you have the following packages installed:
node_modules/pkg1
node_modules/pkg1/node_modules/pkg2
node_modules/pkg2
node_modules/pkg1/index.js asks for packageDir of pkg2, it needs to find the nested one.
Yes, that is true. It would need to do the searching contextually, just like require.resolve, so require.packageDir would be a better place to put it (and thus perhaps, for ESM, import.meta.packageDir)
As paths are URLs in ESM, "dir" is perhaps not the best name. But aside from naming, yes, this would be a good method to expose. In https://github.com/nodejs/node/issues/33460#issuecomment-630968792 I had suggested resolvePackageRoot.
I'm removing the modules agenda label here, as the discussion around this seems to be the same topic as #33460 which is still on the agenda as well.
Any plans to resolve this?
I had completely forgotten about this, but am still very much +1. Here's a way of going about doing this that may be worth considering. The constructor takes the specifier, resolves it, and populates the object with everything it knows about the package.
import { readDirSync } from 'fs';
const tapePkg = new Package('tape'); // resolves the identifier
const pkgDir = tapePkg.packageDir; // access the directory
console.log(readDirSync(pkgDir)); // log the resolved directory contents
Some research on other language approaches: https://docs.microsoft.com/en-us/dotnet/api/system.io.packaging.package?view=netcore-3.1
That seems a bit overkill.
All that I think is needed is Module.getPackageDir(packageSpecifier), and then anything needed from there can be done with fs APIs.
I haven't yet had time to make a PR, but anyone is welcome to beat me to it.
Module.getPackageDir(packageSpecifier) would need a specifier to know where it is starting resolution.
I also am unclear on how any API should integrate with redirection mechanisms like policies and import maps. I'd personally be fine ignoring redirection mechanisms but that could lead to problems where importing doesn't match the API.
We could add a method to the require function so it knows where to start resolution.
Or a function that takes two parameters. getPackageDir(__filename, packageSpecifier) in CJS and getPackageDir(import.meta.url, packageSpecifier) in ESM.
I’d indeed expect a second argument to the static method that took __dirname/import.meta.url. Dangling it off require wouldn’t work in ESM.
I’m not sure how it would interact with policies, but i don’t think there’s any security issues from retrieving a path - if filesystem access is restricted, then you’d be unable to use it, which policies could dictate?
@ljharb I am unclear / don't see an issue at a glance, but the problem can be explained by example:
// redirected to file:///app/node_modules/apm/interceptions/sql.js instead of file:///app/node_modules/sql/index.js
import.meta.resolve('sql');
// likely want file:///app/node_modules/sql/ (trailing slash, yes) ? but that now doesn't match where it is actually getting loaded from
module.package('sql');
This would affect pretty much any custom resolver the more I think about it, but we tend to state for policies that it is the job of the policy creator to manually enforce things like fs access per callsite (by redirecting as desired per scope) and similarly we can just state they need to customize module as well. This is a problem with a single argument form if it isn't an absolute location (path or URL) since it cannot be customized per scope. Similarly, without an absolute location, it wouldn't know what to resolve against, it could use process.cwd() but then things in node_modules can get the wrong values.
Makes perfect sense. Theoretically, this api in node could use the same resolver that would respect policies, and could return the directory that contains that package's package.json - then it wouldn't require extra work on behalf of the policy creator.
The double argument form, with source location, was always the plan; i just failed to mention the second argument in my above comment.
@ljharb if it respects policies, are you saying it should return file:///app/node_modules/apm/? since that is the package for the resolved "sql"? Policies are not realistically human writable (same as any non-trivial import map) so I wouldn't worry too much about work on creating since it can be left to tooling. Policies could always add data for this method to use rather than trying to make it work off the actual resolved location.
It should return whatever path foo would make fs.readFileSync(path.join(foo, 'package.json')) return the expected package.json contents (presuming fs access is available).
@ljharb I'm asking which package.json is expected
That of the sql package, in this case. If the apm wants to provide a fake package dir with a custom package.json it is certainly free to construct one.
@ljharb ah, file:///app/node_modules/sql/ in the example above then; by default it wouldn't resolve using the result of the current data in policies and a new data point would be needed to intercept that (same for loading hooks).
Are there any objections to the following API?
module.getPackageDir(specifier, baseDir)
precise naming aside, that's the only API we've ever really discussed or expected.
module.getPackageDir(specifier, baseDir)
My preference would be to call it getPackageRoot, but otherwise it's fine. In general most docs these days prefer “folder” to “directory,” and also “root” covers you in case this returns an URL (as I assume it would for ESM).
Is the base necessary? Can it be optional, and if omitted it's relative to the current file/module?
I'm fine with "getPackageRoot".
Because the API isn't context-aware (since it's on module and has to be imported, and isn't provided freshly for each module), it has no way of knowing the current file/module - so a base is necessary.
From CJS, you'd pass __dirname, and from ESM, import.meta.url.
Question: How would such a function know what the real package directory is? Theoretically, and practically, there could be a "mini" package.json that is there just for a type: module usage.
@giltayar node's algorithm knows that, because it only looks at the real package directory's package.json for "exports" - and that's the precise knowledge that this API will expose.
@ljharb I do not believe it recursively searches for a package containing "exports" I believe given our documentation it stops on the first package boundary not for a module located in node_modules/(@.../)?...
Maybe I misunderstood the last comment since the resolver wouldn't be recursing up to find the source of a deep import's package boundary with this API but instead resolving the deep import and returning the package.json which performed the last step of the resolve? Maybe we could name this API more clearly.
It’s very explicitly intended to recurse up and find the package boundary; it’s not particularly useful to know the package.json for a given file but it’s necessary to be able to find the package boundary itself.
A different name is fine; it’s the semantic that’s important.
@ljharb the nearest package boundary stops at the first package.json (recursing parent directories) but "exports" check is only during node_modules traversal (only recurses to find the module by a name). I'm now confused on the desired semantics.
The problem is that when a package has “exports” (ie, only when it’s in node_modules) there’s no reliable way to get to the package.json file that has “exports” in it. This api is intended only to solve that problem.
it would be fine, for example, if it rejected when the specifier passed to it wasn’t a package name that’s in node_modules (even on a deep file path) since the only important goal is to get the “exports” field’s package.json’s directory - the package root.
that seems fine, my complaint is we don't define a "package root" currently. as @giltayar points out you can have multiple package boundaries:
/app
/app/node_modules/logger/package.json - package boundary at /app/node_modules/logger
/app/node_modules/logger/esm/package.json - package boundary at /app/node_modules/logger/esm , in theory can contain "exports"
/app/node_modules/logger/cjs/package.json - package boundary at /app/node_modules/logger/cjs , in theory can contain "exports"
if /app/main.js wishes to know which package boundary performed the "exports" resolution of logger/logger.js we get into the confusion.
It sounds like you are trying to define "package root" to be unrelated to the actual filesystem from a traversal after resolution point of view. You want to get /app/node_modules/logger/package.json. This is a short circuiting instead of post resolution recursion. I am not really sure "package root" is a clear term so I think the "exporting package" might be better. For example if we add a redirect:
/app/node_modules/cjs-logger/ -> /app/node_modules/logger/cjs/
Would we expect using the API on cjs-logger/logger.js to return /app/node_modules/logger/cjs/package.json? It seems so from what I'm reading in the comment above. Stating the "right" one isn't really clear and it seems "the one with exports" doesn't make sense if multiple may contain exports. The one which had its "exports" used however is unique and non-reentrant.