Document in the inline PnP format
Some projects -- for example pnp-rs -- depend on source extraction from .pnp.cjs.
https://github.com/yarnpkg/pnp-rs/blob/7078905c2fb322172ed3619e37bc6eebc182d1a1/src/lib.rs#L112
The PnP specification makes no mention of RAW_RUNTIME_STATE, or what format this is present in.
Because pnp-rs is part of the yarnpkg org, I assume this is a desired implementation for tools implementing PnP. (If my assumption is incorrect, please close this, and I'll file a correction for pnp-rs.)
Document RAW_RUNTIME_STATE.
Hopefully this makes sense: Tools should be implementing PnP in the best way, and if the best way to implement PnP support is to work around the specification, the specification should be updated to whatever is the best way.
It's technically not part of the PnP spec, but it should be documented anyway - at this point it's a well-defined quirk of the Yarn implementation, and with pnp-rs officially relying on it along with a couple of other tools (ex Esbuild) we'd be careful not to break it.
AFAIK that's just the PnP manifest as a JSON string inlined in the hook, right? So this is more about spec-ing how the PnP hook code is laid out?
Alternatively, we can encourage disabling inlining the manifest ("out-lining"?) for external integration? I vaguely remember that out-lining was designed for TypeScript integration but then it never got anywhere so now the default is to inline?
AFAIK that's just the PnP manifest as a JSON string inlined in the hook, right? So this is more about spec-ing how the PnP hook code is laid out?
Yep, exactly. In particular documenting the regexp, and adding a test around it.
Alternatively, we can encourage disabling inlining the manifest ("out-lining"?) for external integration? I vaguely remember that out-lining was designed for TypeScript integration but then it never got anywhere so now the default is to inline?
Correct; the default was always to inline but we added outlining in the hope that it'd be easier for third-parties to integrate. But when the first third-party implemented support (Esbuild), they did so by parsing the inlined format (I don't think they even support the outline format, although I might be wrong?), so I followed the same pattern when porting the feature to pnp-rs.
The outline format can probably be completely removed in the next major imo, it never served its purpose.
Wouldn't it make more sense to move in the opposite direction?
I think it is much better to move both esbuild and pnp-rs to the out-line format as the primary integration method. These two can implement a fallback to parsing the inline format, and future integrators may also opt to do so. If a user wants to integrate with something that only supports the out-line format, they have to disable inlining. This is similar to how esbuild needs to be configured to generate an additional metafile to integrate with third-parties that need it, or how SSR frameworks dump a route manifest for integrating with hosting platforms. We may also switch the default to out-lining in the future.
Maintaining the inline format will not only limit how we can generate the PnP hook, but also make it difficult for others to integrate with PnP. JSON is a widespread and battle-tested format. Virtually all programming languages have JSON parsers. It is much more involved if future integrators have to use regex to find the manifest, parse an escaped JSON string, and detect the end of string like in pnp-rs.
As a simple example, suppose I want to integrate with PnP in a bash script, cat $MANIFEST | jq is much easier than having to either read the entire PnP hook into memory and run regex, or do a while read loop.
With TypeScript, we made the design decision that the out-line format is the way that integrators that cannot or are unwilling to run live JS should use to integrate with PnP. I appreciate Evan taking the leap to be the first non-JS integrator, but I don't think we should overthrow that decision just because the first integrator did it in a suboptimal way.
With how the JS tooling ecosystem is moving towards non-JS languages (e.g. tsgo, oxc), I think the decision we make now will have long-term effects on how common PnP integration will be, and the out-line format is much easier to sell integrators on. Also keep in mind that we have another shot at TS integration with tsgo, which may once again make the out-line format neccesary.
With how the JS tooling ecosystem is moving towards non-JS languages (e.g. tsgo, oxc), I think the decision we make now will have long-term effects on how common PnP integration will be, and the out-line format is much easier to sell integrators on. Also keep in mind that we have another shot at TS integration with tsgo, which may once again make the out-line format neccesary.
That's true to some extent but, in the case of Rust, the pnp-rs crate (which we maintain) is the ideal way to consume the data, as the maintenance of the implementation is shared with our team, which is a better sell than asking third-party projects to reimplement the spec from scratch. That's the approach both oxc-resolver and rspack-resolver took, for example.
I know someone is working on another Go implementation of the spec, but from what I understand the regex part wasn't a sticking part (if it was, it'd be easy to only support the out-of-line format).
With how the JS tooling ecosystem is moving towards non-JS languages (e.g. tsgo, oxc), I think the decision we make now will have long-term effects on how common PnP integration will be
This is the impetus. Non-JS tools have substantial proof, and their existence will get only more common. Better to be 100% about the format now.
There is a radical idea, that the data format itself is the interface. (After all....isn't it? That's not an opinion, that is fact -- tsgo, rspack, oxide, esbuild, bun, swc, etc need to include a library or implementation for their language.)
You can generate a shim file as a convenience for some things (like a Node.js preload), but the default universal format will be a JSON structure, whether or not it happens to wrapped in some .js file. And if that's true, maybe a .js file is wrong one.
That's true to some extent but, in the case of Rust, the
pnp-rscrate (which we maintain) is the ideal way to consume the data, as the maintenance of the implementation is shared with our team, which is a better sell than asking third-party projects to reimplement the spec from scratch. That's the approach both oxc-resolver and rspack-resolver took, for example.
I am more coming from the angle that pnp-rs is more or less a de-facto reference implementation of the PnP spec for non-JS languages. If it integrates by parsing the inline format, we are sending the message that it is the optimal way to implement PnP in non-JS languages. I don't think we should do that; we should instead treat the out-line format as the optimal integration method and change pnp-rs to convey that.