bun icon indicating copy to clipboard operation
bun copied to clipboard

Bun runtime plugin `onResolve` doesn't filter non-`file:` protocol imports

Open TomasHubelbauer opened this issue 1 year ago • 33 comments

What version of Bun is running?

1.1.0+5903a6141

What platform is your computer?

Darwin 23.4.0 arm64 arm

What steps can reproduce the bug?

See my repro repo: https://github.com/TomasHubelbauer/bun-runtime-plugin-onResolve-custom-protocol

Make a runtime plugin with an onResolve hook:

import { plugin } from "bun";

await plugin({
  name: "test",
  async setup(build) {
    console.log("Plugin 1 loaded!", build);

    build.onResolve({ filter: /\.demo$/ }, (args) => {
      console.log("onResolve1", args);

      return {
        path: './demo1.ts'
      };
    });
  },
});

Register it in bunfig.toml: preload = ["./plugin.ts"].

Run a script with an import that should get captured by the plugin: bun index.ts:

import demo1 from '1.demo';

console.log('demo 1', demo1);

So far everything works as expected:

Plugin 1 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
onResolve1 {
  path: "1.demo",
  importer: "/Users/tom/Desktop/bun-runtime-plugin-onResolve/index.ts",
}
demo 1 Hello, 1!

Change the plugin to filter for a custom protocol:

import { plugin } from "bun";

await plugin({
  name: "test",
  async setup(build) {
    console.log("Plugin 2 loaded!", build);

    build.onResolve({ filter: /^demo:/ }, (args) => {
      console.log("onResolve2", args);

      return {
        path: './demo2.ts'
      };
    });
  },
});

Change the import and run the script again:

import demo2 from 'demo:2';

console.log('demo 2', demo2);

Now we see a problem:

Plugin 2 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
error: Cannot find package "demo:2" from "/…/index.ts"

Notice that onResolve2 was never even called.

This might be a design decision, not a real bug, if I learn as much, I will contribute a documentation change explaining this.

What is the expected behavior?

Plugin 2 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
onResolve2 {
  path: "demo:2",
  importer: "/Users/tom/Desktop/bun-runtime-plugin-onResolve/index.ts",
}
demo 2 Hello, 2!

What do you see instead?

A runtime error in module resolution not preceded by the custom onResolve hook call.

Additional information

This might not be a bug but a design decision. I ended up trying this, because build.module doesn't accept a regex for the module name (so that I could virtualize all modules on a custom protocol) so I ended up looking at onResolve. Ultimately I would prefer a change to build.module that allows for regexes, but this might also be worth supporting IMO.

TomasHubelbauer avatar Apr 02 '24 20:04 TomasHubelbauer

In theory this could be resolved by supporting import maps. Then we wouldn't need to use plugins.

guest271314 avatar Apr 03 '24 14:04 guest271314

That's interesting. I found https://github.com/oven-sh/bun/issues/4458 talking about adding import map support to the bundler. But I wonder how one would define import maps for the runtime? Bunfig?

TomasHubelbauer avatar Apr 03 '24 14:04 TomasHubelbauer

Ultimately my interest lies in being able to make up a module from thin air when using a custom protocol in the import. Ideally build.module would accept a regex not just a string which would allow me to do that. I ended up looking at onResolve and onBuild as a hacky combination to achieve the same. My logic was that onResolve could catch these imports and drop a temporary file and redirect to it under a different extension and onLoad would pick up that and return the file as { contents } and delete it (I don't want it to hand around on the disk - this would just be a hack in lieu of build.module regex support) so import maps wouldn't help in this specific case.

TomasHubelbauer avatar Apr 03 '24 14:04 TomasHubelbauer

Like this https://github.com/guest271314/webbundle/blob/main/deno.json

{
  "imports": {
    "base32-encode": "https://esm.sh/[email protected]",
    "cborg": "https://esm.sh/[email protected]",
    "commander": "https://esm.sh/[email protected]",
    "mime": "https://esm.sh/[email protected]",
    "to-data-view": "https://esm.sh/[email protected]",
    "wbn": "https://esm.sh/[email protected]",
    "wbn-sign-webcrypto": "https://raw.githubusercontent.com/guest271314/wbn-sign-webcrypto/main/lib/wbn-sign.js",
    "zod": "https://esm.sh/[email protected]"
  }
}

Then bun build --import-map="./deno.json".

The idea for me to get away from runtime-specific inventions, so we can run the same code in bun, deno, and node, et al.

guest271314 avatar Apr 03 '24 14:04 guest271314

This is related to bun build, but what about the bun index.ts case - no build step, just Bun running a file at runtime with some plugins in the Bunfig preload. I would expect import map support would include the ability to specify (and mutate) the import map at runtime the same way that is possible in the browser.

Edit: turns out mutating import maps at runtime is not possible in browsers, I mis-remembered: https://github.com/WICG/import-maps/issues/92.

TomasHubelbauer avatar Apr 03 '24 14:04 TomasHubelbauer

Ultimately my interest lies in being able to make up a module from thin air when using a custom protocol in the import.

For your use case you should be able to use a Data URL or a Bob URL without having to create a custom protocol. See Deno dynamic import("./exports") throws module not found for "exports.js" dynamically created in the script

  const script = `export default 1;`;

then https://gist.github.com/guest271314/4637bb1288321256d7c14c72ebc81137#file-dynamic_import_always_throws-js-L24-L26

 // bun
  if (runtime.includes("Bun")) {
    await Bun.write("exports.js", encoder.encode(script));
  }

then use dynamic import() https://gist.github.com/guest271314/4637bb1288321256d7c14c72ebc81137#file-dynamic_import_always_throws-js-L27-L29

  const { default: module } = await import("./exports.js"); // Raw string specifier
  console.log({ module });
  console.log({ runtime });

Deno's dynamic import() ain't dynamic. So far Bun's is.

guest271314 avatar Apr 03 '24 14:04 guest271314

I am not attempting to create a custom protocol, I have an existing custom protocol which I am looking to bridge Bun into supporting. It is possible that runtime plugins are intentionally locked into the file: protocol only, but if that design decision is not strongly held, I would definitely like runtime plugins to support any protocol / filter against any import specifier, not just the ones with implicit file: protocol.

Node has experimental support for custom loaders which I believe are roughly equivalent to Bun's runtime plugins and AFAICT it should be possible to create a custom loader which matches the URL by protocol, here demonstrated on a rudimentary HTTP/HTTPS loader: https://nodejs.org/api/module.html#import-from-https

Whether it is by build.module module virtualization with regex module name matching or extending onResolve and onLoad to support non-file protocol, this would enable something similar to what's possible there.

TomasHubelbauer avatar Apr 03 '24 14:04 TomasHubelbauer

Why is the custom protocol necessary? Can't you just do

import(`data:text/javascript,export default 1`)

?

guest271314 avatar Apr 03 '24 14:04 guest271314

Whatever is on the other side of the import is dynamic, I can't bake it in or use a static data: protocol import.

TomasHubelbauer avatar Apr 03 '24 14:04 TomasHubelbauer

Reproduced on Bun version 1.0.36 (bun-linux-x64-baseline) re onResolve.

I was able to get the expected work using import.meta.url, new URL(), and build.module()

import { plugin } from "bun";

await plugin({
  name: "test2",
  async setup(build) {
    console.log("Plugin 2 loaded!", build);
    build.module(
      // The specifier, which can be any string - except a built-in, such as "buffer"
      "demo:2",
      async () => {
        const metaURL = new URL(await import.meta.resolve("./demo2.ts"), import.meta.url);
        return {
          contents: await (await fetch(metaURL.href)).text(),
          loader: "ts",
        };
      }
    )
/*
    build.onResolve({ filter: /^demo:/ }, (args) => {
      console.log("onResolve2", args);

      return {
        path: './demo2.ts'
      };
    });
*/
  },
});

guest271314 avatar Apr 04 '24 01:04 guest271314

This alone works too

const metaURL = new URL("./demo2.ts", import.meta.url);

guest271314 avatar Apr 04 '24 01:04 guest271314

Edit: turns out mutating import maps at runtime is not possible in browsers, I mis-remembered: https://github.com/WICG/import-maps/issues/92.

I'm not sure how that issue is related. Bun doesn't implement import maps so we can't test import maps at all in bun.

Technically this is possible bun and deno, where we can fetch() file: protocol using absolute path:

const script = await (await fetch("file:///absolute/path/to/local/file")).text();
const imports = await import(new Blob([script], {type:"text/javascript"}));

And possible for node which does not support file: protocol for the Undici fetch() implementation (see Implementing file: protocol support for WHATWG fetch() in Node.js without fs or import) you can modify Undici/lib/web/fetch/index.js#L129 to include

  const p = createDeferredPromise()
  const url = new URL(input);
  if (url.protocol === "file:") {
    return import("node:fs").then((fs) => {
      p.resolve(new Response(fs.readFileSync(url)));
      return p.promise;
    })       
  }

for the capability to to use file: protocol with fetch() using node.

Then you can use dynamic import() for any script you create in the script or not. For deno with the caveat that bare string specifers behave differently than bun or deno dynamic_import_always_throws.js.

guest271314 avatar Apr 04 '24 04:04 guest271314

Should be

const imports = await import(URL.createObjectURL(new Blob([script], {type:"text/javascript"})));

to create a Blob URL. Which also gets around denos bare string specifier behaviour. Then you can iport anything from anywhere, particularly using import assertions with type set to "json" and data serialized to a Uint8Array spread to an Array and stringified to JSON.

guest271314 avatar Apr 04 '24 04:04 guest271314

This is hownode works

// https://github.com/nodejs/node/blob/main/doc/api/module.md#import-from-https
// https-hooks.mjs
import { get } from "node:https";

export function load(url, context, nextLoad) {
  // For JavaScript to be loaded over the network, we need to fetch and
  // return it.
  if (url.startsWith("https://")) {
    return new Promise((resolve, reject) => {
      get(url, (res) => {
        let data = "";
        res.setEncoding("utf8");
        res.on("data", (chunk) => data += chunk);
        res.on("end", () =>
          resolve({
            // This example assumes all network-provided JavaScript is ES module
            // code.
            format: "module",
            shortCircuit: true,
            source: data,
          }));
      }).on("error", (err) => reject(err));
    });
  }

  // Let Node.js handle all other URLs.
  return nextLoad(url);
}
node --experimental-default-type=module --experimental-network-imports --loader=./https-hooks.js ./test-network-imports.js

or

node --experimental-default-type=module --experimental-network-imports --import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register(pathToFileURL("./https-hooks.js"));' ./test-network-imports.js

See

feross/buffer bundled to Ecmascript Module.

guest271314 avatar Apr 04 '24 04:04 guest271314

Using deno with an import map deno run --import-map=import-map.json index.ts

import-map.json

{
  "imports": {
    "1.demo": "./demo1.ts",
    "demo:2": "./demo2.ts"
  }
}

guest271314 avatar Apr 04 '24 05:04 guest271314

Thanks for the PR, I noted in the repro repo README and in my initial post in this issue that I can't use build.module because it doesn't take a regex. demo:2 is not meant to be a module name, the plugin is meant to be able to handle any resolution starting with demo:.

As far as import maps go, I was just pondering how it could look, I am aware Bun doesn't support import maps at the moment, but should it, I would hope for an implementation where the import map can be passed to bun index.ts via the CLI or Bunfig or some other means, not only baked into a build via bun build.

I am aware of the Node loader support, I've linked it previously. This is something I am hoping to do an equivalent of in Bun, because it seems like it can support custom protocols as the HTTP/HTTPS loader demonstrates. I believe if build.module took a regex as an option in addition to a string, it would be possible to add this type of support to Bun as well.

I've been able to build Bun locally and while I am not at al proficient in Zig, I think I've gathered enough context by now to maybe be able to contribute regex support to build.module. I'm not very good at debugging Zig yet, though, so it's a lot of wading through log statements.

TomasHubelbauer avatar Apr 04 '24 06:04 TomasHubelbauer

I can't use build.module because it doesn't take a regex

Sure you can use build.module. RegExp is not special. Just write out all the "demo:*" specifiers you are expecting.

The Node loader and Bun plugin systems are roughly the same from my perspective. Import maps are by far the least amount of code and less cumbersome.

guest271314 avatar Apr 04 '24 13:04 guest271314

I cannot write out all the specifiers I am expecting as the specifiers are dynamic, not known at compile time.

TomasHubelbauer avatar Apr 04 '24 13:04 TomasHubelbauer

You have to be able to write them out at some point in the process, else you will be getting "Module not found" errors.

guest271314 avatar Apr 04 '24 13:04 guest271314

With build.module being able to return virtual modules, there is no need to map every module specifier to a path on the disk, especially if the module isn't backed by a physical path. The Node HTTP loader is a good example of this.

https://bun.sh/docs/runtime/plugins#virtual-modules

TomasHubelbauer avatar Apr 04 '24 13:04 TomasHubelbauer

I am expecting as the specifiers are dynamic, not known at compile time.

there is no need to map every module specifier to a path on the disk

Yes, there is. How else are you going to resolve some arbitrary specifier to a real path?

That doesn't matter. There MUST be a mapping between the specifiers you are expecting and URL's. You MUST already know what arbitrary specifiers map to which local or remote paths. There is no escaping these technical facts.

From my perspective both Node.js loader and Bun plugin systems are clucky and less ideal than import maps - but that's all node and bun have since neither has import maps. Node.js loader system doesn't offer anything novel in this regard cf. Bun' plugin interface.

guest271314 avatar Apr 04 '24 13:04 guest271314

I am expecting as the specifiers are dynamic, not known at compile time.

Then how are you going to write a RegExp for the specifier? You can't in this case.

guest271314 avatar Apr 04 '24 13:04 guest271314

Here's the simplest case I can think of to demonstrate build.module with a regex:

import { plugin } from "bun";

await plugin({
  name: "test",
  async setup(build) {
    console.log("Plugin loaded!", build);

    build.build(/^demo:/, (specifier) => {
      return {
        contents: `export default "Hi there, I am the '${specifier}' import!";`,
        loader: 'ts'
      };
    });
  },
});
console.log(await import('demo:alice')); // "Hi there, I am the 'alice' import!"
console.log(await import('demo:bob')); // "Hi there, I am the 'bob' import!"

TomasHubelbauer avatar Apr 04 '24 13:04 TomasHubelbauer

As I suggested above you can use a Blob URL or Data URL for that.

Even simpler is to just write out the "demo:alice" and "demo:bob" specifiers.

guest271314 avatar Apr 04 '24 13:04 guest271314

I see what you are trying to do now. You are correct, Bun doesn't have the specifier parameter string as a parameter exposed in the function that returns content and loader. That's a reasonable feature request.

The RegExp parts doesn't do anything special though. We could just as well use string methods to to match and do stuff. What you want is that specifier in the parameter passed to the function that returns content and loaders type.

guest271314 avatar Apr 04 '24 14:04 guest271314

I'll try t figure out a way or two to achieve what you are tryiing to do with what we already have exposed.

guest271314 avatar Apr 04 '24 14:04 guest271314

Here's one way to achieve the requirement, including still using bun.plugin(). We parse the file that is going to be passed to bun run in the preloaded script, before we call plugin(). That is, we do what we want in the preload script outside of just relying on bun.plugin() to do the work. We use RegExp to match specifiers begining an arbitrary specifier: specifier, replace the exported content.

We can do this without bun.plugin(), too.

const url = new URL("./index.ts", import.meta.url);
const script = await (await fetch(url.href)).text();
const regexp = /(?<=['"])demo:\w+(?=['"])/g;
const matches = script.match(regexp);

import { plugin } from "bun";

for (const specifier of matches) {
  await plugin({
    name: specifier,
    async setup(build) {
      console.log("Plugin 2 loaded!", build);
      build.module(
        // The specifier, which can be any string - except a built-in, such as "buffer"
        specifier,
        async () => {
          console.log({ specifier });
          return {
            contents: `export default "Hi there, I am the '${specifier}' import!";`,
            loader: "ts",
          };
        },
      );
    },
  });
}

guest271314 avatar Apr 05 '24 01:04 guest271314

The script file name passed to bun is dynamically retrievable using Bun.argv

plugin2.ts

const [, file] = Bun.argv;
const url = new URL(file, import.meta.url);
// ...

guest271314 avatar Apr 05 '24 01:04 guest271314

Substituting Bun.file() for fetch() in plugin2.ts

const url = await import.meta.resolve(file);
const script = await Bun.file(url).text();

then we can do bun run index.ts which contains

import demo1 from '1.demo';

console.log('demo 1', demo1);

import demo2 from 'demo:2';

console.log('demo 2', demo2);

console.log(await import('demo:alice')); // "Hi there, I am the 'alice' import!"
console.log(await import('demo:bob')); // "Hi there, I am the 'bob' import!"

and get this result

Plugin 1 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
[ "/home/user/bin/bun", "/home/user/bun-runtime-plugin-onResolve-custom-protocol/index.ts"
]
Plugin 2 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
Plugin 2 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
Plugin 2 loaded! {
  target: "bun",
  onLoad: [Function: onLoad],
  onResolve: [Function: onResolve],
  module: [Function: module],
}
onResolve1 {
  path: "1.demo",
  importer: "/home/user/bun-runtime-plugin-onResolve-custom-protocol/index.ts",
}
{
  specifier: "demo:2",
}
demo 1 Hello, 1!
demo 2 Hi there, I am the 'demo:2' import!
{
  specifier: "demo:alice",
}
Module {
  default: "Hi there, I am the 'demo:alice' import!",
}
{
  specifier: "demo:bob",
}
Module {
  default: "Hi there, I am the 'demo:bob' import!",
}

guest271314 avatar Apr 05 '24 02:04 guest271314

To not hard code any specifier we can use RegExp, too, to match static import ... or dynamic import( ... )

const regexp = /(?<=['"]|import\s\w+\sfrom(?=\s|)|import\((?=\s|))[\w\d+]+:[\w\d]+(?=['"])/g;

guest271314 avatar Apr 05 '24 06:04 guest271314