html Import with type text, bytes, and URL

I’m working with a group of TC39 delegates on what we call Module Harmony, an effort to make proposals pertaining to the module system coherent. I am consequently looking for the right venue to propose and establish a precedent for a host integration with modules, specifically to address the portability of code that uses the module system to express a dependency upon plain text, bytes, or references to assets. Concretely, I would like to propose that:

import text from 'text.txt' with { type: 'text' };
import bytes from 'bytes.oct' with { type: 'bytes' };
import imageUrl from 'image.jpg' with { type: 'url' };

Such that:

typeof text === 'string';
bytes instanceof Uint8Array;
typeof imageUrl === 'string'; // edit: was instanceof URL

So that a module can express these kinds of dependency in a way that is portable. Specifically, I aim for a program to be run on the server side and the client side of a web application, both raw and thru an optimizing translation (e.g., bundling). With import attributes, ECMA 262 is already sufficiently expressive to allow a host integration to address this problem without additional features, and would be coherent with future 262 proposals, particularly virtual module sources.

Jun 19 '23 17:06 kriskowal

Re: the URL import, using a URL instance for this seems contradictory to guidance in WHATWG URL:

A standard that exposes URLs should expose the URL as a string (by serializing an internal URL). A standard should not expose a URL using a URL object. URL objects are meant for URL manipulation.

For a module, not using the mutable URL representation would seem particularly important, I’d think?

Jun 19 '23 17:06 bathos

For a module, not using the mutable URL representation would seem particularly important, I’d think?

A string representation of the URL would entirely satisfy the motivating use cases.

Jun 19 '23 17:06 kriskowal

Looking at https://fetch.spec.whatwg.org/#body-mixin I wonder if we want arrayBuffer instead, but I suppose that was a mistake on Fetch's part and it should have been bytes returning a view (we could still add that I suppose).

I'm not sure I understand how url works. How is it different from import.meta.resolve('image.jpg')?

Do we need to solve @domenic's #7017 about feature detection at the same time?

There's also #4321 from @jamesernator and #7706 from @7ombie. These all look like duplicates, but I'm fine with keeping them open until we have some kind of plan. One thing that's raised in the latter that's important here is what to do about MIME types. Would we not check response MIME types for these, similar to Fetch? Or would we try to enforce something?

cc @whatwg/modules

Jun 20 '23 06:06 annevk

How is it different from import.meta.resolve('image.jpg')

Good question. The semantics would actually be the same. One piece of motivation is that this form is more "declarative"-looking and therefore statically analyzable (which should mostly help build tools, given that not enough information is available for a prefetcher to use this). See more information about motivation (for a previous iteration of this idea) at https://github.com/tc39/proposal-asset-references . Also note that some people in TC39 are considering whether we should propose some other syntax for this, besides using import attributes.

Jun 20 '23 15:06 littledan

Yes, this would provide a statically analyzable alternate route to the same url value, analogous to static vs dynamic import. This is less interesting for the web than it is interesting because it establishes a convention that build tooling would benefit from.

For example, a bundler that takes a whole web application directory tree and generates a new tree, the bundler would be able to discern the dependency and rewrite the URL.

For a bundler that takes a whole web application tree and generates a single JavaScript file, it would have the option of embedding the underlying data URL.

That’s to say, any static syntax that reveals the url of an asset in a way that implies a dependency needs to be arranged by a bundler is an improvement on the status quo. This is one of the options we are considering.

As @annevk mentions in chat, this approach has the disadvantage of introducing a code path under the host import hook that bypasses a fetch.

For this reason, the alternative approach is to introduce another import phase, as we do with import source and import defer proposals, except the phase would occur before fetch. This has a different smell: it is not clear that such a module would advance beyond the asset phase. It is clear that it would not compose well with import with type, since the type is irrelevant unless we advance to fetch. We would presumably be obliged to allow the module system to fetch an image (for example) and fail to interpret it as JavaScript.

The implication for Module virtualization is that an asset import would have to bypass the import hook and provide an alternate lane that can be interrupted before fetch (to produce a url) and then again before parse (to produce bytes or text) before possibly proceeding to produce source, at which point it will have done all the work currently subsumed by the host import hook.

[added:]

The implication for Module virtualization if we pursue with type is simply that these are different module source types that terminate at exporting a default value when they’re evaluated. So, the proposed import hook virtualization would just return a non-JavaScript module source with the appropriate behavior.

Jun 20 '23 16:06 kriskowal

There's also #4321 from @Jamesernator

The suggestions there had quite a different flavour given at the time JSON modules were proposed to be derived based on MIME type, rather than the current approach that uses import attributes (which MIME type must agree with).

This new style with import attributes is strictly more useful as one can interpret essentially anything as an array buffer/text regardless of it's actual MIME type.

e.g. In my previous suggestion, text would only be successfully imported if it were text/plain, but a lot of stuff might be in text/yaml, text/json5, etc etc.

As such that old issue can be closed in strong favour of this one.

One thing that's raised in the latter that's important here is what to do about MIME types. Would we not check response MIME types for these, similar to Fetch? Or would we try to enforce something?

For urls there's obviously nothing to do as no fetching is involved.

For array buffers, checking MIME types is undesirable as people might be loading any content for some processing (e.g. images, audio, application specific formats, are all reasonable reasons to import array buffers).

For text checking the type/essence is similar to array buffers, any MIME type (not just text/*) might contain text. However we do need to know about encoding, so the parameter charset should probably be respected.

Alternatively for text, we could have a separate attribute that indicates what format to decode as (potentially useful if the server doesn't know what charset files are using).

import someText from "./file.txt" with { type: "text", encoding: "utf16" };
import someText from "./oldData.dat" with { type: "text", encoding: "latin2" };

// Would default to utf8 naturally so these would be equivalent
import someText2 from "./file2.ini" with { type: "text", encoding: "utf8" };
import someText2 from "./file2.ini" with { type: "text" };

Jun 29 '23 06:06 Jamesernator

In Bun v1.1.5, we are adding bundler & runtime support for text, json & toml. text is UTF-8 and replaces invalid UTF-8 with FFD. We probably will support BOM later to handle UTF-16. Named imports (excluding default) with type: “text” throw an error at parse time.

https://github.com/oven-sh/bun/pull/10456

Apr 23 '24 09:04 Jarred-Sumner