deno icon indicating copy to clipboard operation
deno copied to clipboard

Public API for compilers

Open kitsonk opened this issue 5 years ago • 42 comments

For tracking purposes, please don't work on this without discussing with Ry or myself.

Having a public API that is similar to how we perform TypeScript compilation is a good idea. It would allow JS->JS transpilation (e.g. those who need Babel custom plugins or Flow) or other languages (e.g. CoffeeScript).

Related to #1738 and some other work in rationalising the compiler APIs internally, but we should be able to support loading a compiler in a web worker and instructing the privileged side what resources should be sent to that runtime compiler.

kitsonk avatar Feb 11 '19 03:02 kitsonk

Dart and Elm are other candidate languages.

daniele-orlando avatar Feb 19 '19 16:02 daniele-orlando

I think Dartlang is better than TypeScript.

islishude avatar Feb 28 '19 12:02 islishude

@isLishude then you are probably looking at the wrong project.

kitsonk avatar Mar 01 '19 02:03 kitsonk

What you really want is a extensible module loading system(not resolution as already discussed this is prone to problems). I have quite a few ideas on this one, and even some existing code. I've been working on similar idea for while now, and I think the only good way to design this is with a two stage loading process:

  1. First "load"(what this means is really dependent on the media type) modules to js or ts(possibly with embedded wasm) source, since these are the only formats that the typescript compiler understands(other than JSON).
  2. Compile using something similar to the existing ts compiler.

The first step would be the only part that is user extensible.

afinch7 avatar Mar 01 '19 22:03 afinch7

The public compiler API would need to:

  1. Allow a user to register a compiler. That registration would include extensions and media types that it should compile. For security and consistency reasons, TypeScript media types and extensions should be disallowed, but JSON and JavaScript could potentially be registered. This would be a special type of web worker. I am not certain if this should be a new op or if, we add non-standard customisations to the options of new Worker(url, options) to provide the appropriate information.
  2. When the media type is encountered, Rust would post a message on the worker providing the module specifier and refer.
  3. The userland compiler would be able to fetch resources using the fetch_module_meta_data op. We would enforce the read file/network security on this (unlike the built in compiler).
  4. The userland compiler would post back via the web work API to Rust the compiled module code, source map and any diagnostics.
  5. The userland compiler would need to be able to be unregistered. (I guess we would just use Worker.terminate())

kitsonk avatar Mar 02 '19 01:03 kitsonk

You specified separating compilers by media type, but I don't think that extension(media type) is really a reliable way to do this(how would you handle no extension). I guess you could decide to on a list of supported media types, but I feel that would be very limiting. You could load modules by manifest and have the manifest tell the loader system what sort of loader to use, but that already sounds way to much like package.json if you ask me. That might be fine if some thought was put into it, but I don't think thats really what the deno community wants.

In general the more complicated your expectations get the more difficult this will be to implement, and the more bugs we will encounter in the process. It might be much more simple to decide that user compilers should be trusted code, and expect them to handle the retrieval of resources required to complete their tasks thus the nomenclature loader would be more fitting. The expectations for said loaders could be as simple as:

  1. Loaders will be given a module's fully qualified url(this could be generated using native url parsing I.E. new URL(specifier, referrer ? referrer : defaultUrl)).
  2. Loaders will also be given a modules referrer information: origin url, source code, source map, media type, source loader(what loader was used to provide this resource?), etc.
  3. Loaders would be expected to take the information from 1 and 2, and accurately as possible return a ts or js module that represents that modules url(or error out if not possible).
  4. Loaders should be given the ability to error out on a request for any reason(and should be encouraged to error out as soon as possible).
  5. Loader priority should be determined by their order in the list of configured loaders, and each loader must be given an attempt and error out before trying the next one.
  6. Loaders should be designed to be platform agnostic, so they can be integrated in tooling like a typescript language services plugin. This would most likely be achieved by having the implementation passing the loaders platform specific implementations of a shared resource accessor api.

The new dynamic import could be used to load these "loaders" as modules with a defined structure. I tried to describe this as best as possible, but I figured I might be able to better represent my ideas in typescript interfaces and I also included a simple example implementation: https://gist.github.com/afinch7/4356a4377ec20dc336456d4639777578.

This would absolve the implementation of a lot of potentially complicated responsibilities, and even allow loaders to make security decisions about the content they are attempting to load(like a browser would if you tried to make a cross origin request).

A simple approach like this could support just about any use case, and could be universal enough to enable parity between the deno compiler and a typescript language services plugin(Seamless accurate integration in the editor is rare for systems like this). It also wouldn't be limited to javascript transpilers or json processing you could very easily compile just about anything into a javascript or typescript module. Flatbuffers definitions could be compiled to a typescript module at runtime, or c++, c, or even rust source code could be compiled to wasm and embedded into a js/ts module at runtime. You could have a nearly completely language agnostic platform, and give the developer the information to actually use it effectively.

afinch7 avatar Mar 03 '19 19:03 afinch7

This might require a preflight check of sorts to check for redirects #1742 or give deno control of fetching the module "entry point".

afinch7 avatar Mar 03 '19 20:03 afinch7

You specified separating compilers by media type, but I don't think that extension(media type) is really a reliable way to do this

This is exactly how we do it today. media type != extension.

(how would you handle no extension)

Without a media type, that is insecure. We wouldn't want to allow a file like that to be processed... Files without extensions require a media type.

It might be much more simple to decide that user compilers should be trusted code, and expect them to handle the retrieval of resources required to complete their tasks thus the nomenclature loader would be more fitting.

We don't trust our own compiler. It does not do module resolution. That is up to privileged/Rust, as is appropriate.

There is nothing preventing someone from implementing a loader and eval'ing code today with the right Deno permissions. A public API for a userland compiler needs to follow the pattern of the built in compiler.

kitsonk avatar Mar 05 '19 01:03 kitsonk

Those are valid concerns. You want something that doesn't require any user setup or config, and my approach would require user configuration to work thus it falls outside of the deno philosophy.

It will always be distributed as a single executable - and that executable will be sufficient software to run any deno program. Given a URL to a deno program, you should be able to execute it with nothing more than the 50 megabyte deno executable.

I think that pretty much settles what direction deno should go, but I still have my concerns with the idea of a untrusted compiler. I think my main concern here is how can you in any way trust the code a compiler emits if you don't have full trust in the compiler as a end user.

In general I think we are both on completely different pages right now, so I want to do what is needed to put us all on the same page with this one.

afinch7 avatar Mar 05 '19 17:03 afinch7

[snip]

I still have my concerns with the idea of a untrusted compiler. I think my main concern here is how can you in any way trust the code a compiler emits if you don't have full trust in the compiler as a end user.

[snip]

Just a lurker here, but I think what @kitsonk means by trusted/untrusted isn't what you think. You're right that one can't delegate code generation without the risk that the output will do something unwanted. Ken Thompson addressed this famously in his "Reflections on Trusting Trust" presentation in 1984.

The objective of treating the compiler as untrusted is to limit the damage it can do. Isolating the compiler prevents it from (for example) invoking /bin/bash in hopes of exploiting that unrelated program. The isolation is a reduction in attack surface as part of a defense-in-depth strategy.

The reason it's acceptable to risk nefarious or broken compiler output is because there is no architectural way to avoid the risk. The risk has to be addressed at a different layer, such as via module signing, code review, webs of trust, insurance mechanisms, etc.

I hope my comment is helpful.

rdeforest avatar Apr 21 '19 18:04 rdeforest

I propose to use service worker api to provide the compiler API:

https://github.com/denoland/deno/issues/2676#issuecomment-514059683

oldrich-s avatar Jul 23 '19 05:07 oldrich-s

Service Workers aren't really suitable for a public compiler API. Service Workers are a specific class of Web Workers anyways. The existing compiler is implemented as a web worker, and a specific class of web worker would also be suitable IMO for the public compiler API, which is laid out above.

kitsonk avatar Jul 23 '19 06:07 kitsonk

TypeScript media types and extensions should be disallowed

Strongly disagree with this. Put it behind a permission flag, but we should have the possibility to use the existing babel ecosystem or other tools to preprocess TypeScript files.

brandonkal avatar Jan 15 '20 08:01 brandonkal

It could be useful to be able to use a custom compiler for TypeScript as well, like ttypescript or reflec-ts. I would generally avoid it for performance reasons but it might be useful for some experiments, unless it is a problem with bootstrapping, i.e. the main entry file that defines a custom TS compiler itself being compiled by a built-in TS compiler chicken and egg problem. Some things could be done with custom transformers if supported by #2089/#2927/#3442.

rsp avatar Jan 15 '20 09:01 rsp

I don't know if I'm in the right place, but I would like to write ClojureScript and directly or indirectly run it through deno! 🙃 🙌

JimLynchCodes avatar Feb 10 '20 01:02 JimLynchCodes

If there is a JavaScript based Clojure compiler, then that would likely be possible to accomplish with this feature.

kitsonk avatar Feb 10 '20 02:02 kitsonk

@kitsonk I assume this isn't a goal for 1.0 is it?

Soremwar avatar Apr 15 '20 19:04 Soremwar

No as manifested by its future milestone.

lucacasonato avatar Apr 15 '20 19:04 lucacasonato

@kitsonk who want dart support is for several reasons: 1-Dart backend frameworks are not popular in the community. 2-Nodejs is very good but it has a lot of JS & TS no enough Dart tutorials 3-Flutter is a frontend for all platforms web, desktop, and mobile

For all reasons above Flutter, developers want to be a full-stack developer over the night without the extra effort of tracking TS 3.xx or Ecma 2020. We have to learn a lot to be a universal front developer like Native Android, iOS, and so on.

So what could be cool about Deno over NodeJS for Flutter community: -Early support means early community, libraries, tutorials, and tools in Dart.

Ok, that's cool but what Deno will gain of Flutter community!!! 1-Pushing Fuschia OS support for Deno the same as NodeJS support. 2-Tools from the Flutter community. For example, I am using Flipper to track my application state, database but guess what it is by Facebook for React Native but I use it for Flutter. 3-More contributors and a big push to the project. Flutter community is very big and Angular-Dart too.

I know your team is doing hard effort and I just want to share my ideas with you Thanks a lot to you and your team for your hard effort.

amreniouinnovent avatar May 21 '20 08:05 amreniouinnovent

Just a thought, nodejs has an experimental loaders feature. Why not model the public compiler api after this? Deno shouldnt care about supporting any other languages, instead it should expect the output of a loader to be consumable javascript (or typescript)

andykais avatar May 28 '20 14:05 andykais

I see that it is expected that the compiler is written in JavaScript, I believe that is... concerning?

Love it, or hate it, but TSC is slow. For a transpiler that simply does three main things:

  • checks the types of code
  • inlines constant enums
  • modifies TS concepts such as namespaces into valid JS

It has to be among the slowest pieces of software that I have dealt with to-date.

In contrast, any compiler that is written in native code for a sensible language is blazingly fast. Of course, there are exceptions (ex: C++), but this is generally true.

Let's say that I thought that it was a good idea to hook-up rustc and Wasm-bindgen to Deno. (this idea actually sounds pretty neat) All that the JS would see is something like this:

import { foo } from "./mod.rs";

foo(42);

All that I would need to do is direct what file the compiler needs to read.

This also brings up something that was mentioned earlier:

  1. The userland compiler would need to be able to be unregistered. (I guess we would just use Worker.terminate())

Maybe I was using Deno's (unstable) native plug-in API to run the native compiler, if the Worker were terminated while executing the native plugin, the native code would continue executing, correct? A better deregistering system would be needed, something that would allow the author to clean up after itself and/or kill other processes before being terminated. Even a JS compiler might need to .terminate some other workers. (does .onclose work in Deno workers?)

Yet, while I doubt that many developers would be dealing with native code, as opposed to JS, TS, Dart, or another language that transpiles to JS, this may still be an avenue worth looking into.

ghost avatar Dec 03 '20 23:12 ghost

It is true that tsc is slow, however, there are very good reasons for why an alternative (fully equivalent, e.g. typechecking) implementation does not exist yet:

  • TypeScript has no formal spec, even implementing a ts parser (ive done it and i can say its painful) is difficult, knowing how to typecheck is even harder.
  • TypeScript has an absolute ton of random rules its typechecker enforces and a ton of edge cases which need to be handled.
  • The TypeScript repository is downright painful to read, and the tsc team does not have a plan to change it (according to them its for performance), here are a couple of examples why:
    • All of the tests used by the type checker and friends are in a single folder (thats 36k+ files!)
    • The parser is all in a single file thats 8+kloc
    • The type checker is a single file! 40 kloc of type checking in a single file
  • The TypeScript type checker receives a ton of commits, its difficult to keep up with all the changes that happen.

It may seem as simple as just "rewrite it in rust", but its surprisingly difficult, the only implementation i know of which is trying to do this is a closed source implementation by @kdy1. It is relatively simple to convert ts to js, swc can do this already, and deno uses swc for this. The hard part is type checking. I would love to have a fully working ts type checker in rust as i am making a js/ts linter, and i am hopeful for the future that this can be achieved.

RDambrosio016 avatar Dec 03 '20 23:12 RDambrosio016

Just a thought, nodejs has an experimental loaders feature. Why not model the public compiler api after this?

Hi, just noticed this. FYI, I’m responsible for much of the design of Node’s ESM loaders. We’re working on a redesign that collapses the four hooks (resolve, getFormat, getSource, transformSource) into two: resolve and load, or perhaps resolveToURL and loadFromURL. The idea behind the redesign is that depending on the flow, like for example whether you’re loading from disk or from HTTPS, you might need to load the source contents before determining format or vice versa. A WIP PR is at https://github.com/nodejs/node/pull/35524 and a tracking issue is at https://github.com/nodejs/node/issues/36396.

I’m not sure Deno has Node’s need to determine format (which isn’t just CommonJS vs. ESM, but also Wasm or builtin or JSON), so for Deno maybe the more granular hooks might work better, so that a plugin can register itself only to use a “transform” hook and leave all the resolution and loading from disk/network for Deno to handle. Anyway I just thought I’d say hi and let you know what we’re (very slowly) working on, and if there’s any way we can help please ping me or the others on that PR thread.

GeoffreyBooth avatar Dec 04 '20 00:12 GeoffreyBooth

@GeoffreyBooth thanks for the input! This is a really slow burn issue for Deno, but it would be great to align. I think a "resolve" and "load" hooks should be great. @piscisaureus did some thinking and POCing around this semi-recently.

Deno currently doesn't need a way to determine format, effectively a single media type resolves to a specific format, or expected that whatever transforms it can handle what-ever variants of that media type. I will take a look at the PR thread!

kitsonk avatar Dec 04 '20 02:12 kitsonk

@RDambrosio016 Trust me, I fully understand (I once looked at their 8kb+ block). Even trying to parse plain JS requires a whole lot of code. TS is still a relatively dynamic language, so it's not easy to type check. I was using TSC as an example of a poorly performing "compiler" written in JS.

The key point is I believe that Deno should try to support more than just JS compilers, or at least make it easier to use something that isn't JS.

There are also plenty of other compilers that emit JS that I believe are not written in JS, for example, dart2js and the ClosureScript compilers. (I may be wrong)

ghost avatar Dec 04 '20 08:12 ghost

I guess this is kind of like require hooks in NodeJS. I think another usage would be for compiled to JS functions template engines, so you don't have to read the file and compile, you can just import it.

shadowtime2000 avatar Jan 03 '21 22:01 shadowtime2000

how much away are we from this to happen?

auvipy avatar Sep 11 '21 11:09 auvipy

Quite a lot.

kitsonk avatar Sep 11 '21 22:09 kitsonk

@kitsonk it looks like the module resolution API for your deno_graph module is pretty much there, no? I've used the module a little bit and it seems to work quite smoothly. Can that be reworked back into deno?

mimbrown avatar Mar 04 '22 18:03 mimbrown

@mimbrown it is already part of Deno, as it is what is used to do module resolution, but directly as a Rust crate.

We are unlikely to expose it as an internal API, because it is available as a JavaScript/WASM API.

kitsonk avatar Mar 05 '22 08:03 kitsonk

Yes I am aware, sorry my comment was not at all clear. What I meant was, the createGraph function exposed by the deno_graph module has a set of options that allow for user-defined module resolution and loading. You're giving users hooks to override the default behavior. I'll copy the options interface here:

interface CreateGraphOptions {
  /**
   * A callback that is called with the URL string of the resource to be loaded
   * and a flag indicating if the module was required dynamically. The callback
   * should resolve with a `LoadResponse` or `undefined` if the module is not
   * found. If there are other errors encountered, a rejected promise should be
   * returned.
   *
   * @param specifier The URL string of the resource to be loaded and resolved
   * @param isDynamic A flag that indicates if the module was being loaded
   *   dynamically
   */
  load?(
    specifier: string,
    isDynamic: boolean,
  ): Promise<LoadResponse | undefined>;
  /** The type of graph to build. `"all"` includes all dependencies of the
   * roots. `"typesOnly"` skips any code only dependencies that do not impact
   * the types of the graph, and `"codeOnly"` only includes dependencies that
   * are runnable code. */
  kind?: "all" | "typesOnly" | "codeOnly";
  /** When identifying a `@jsxImportSource` pragma, what module name will be
   * appended to the import source. This defaults to `jsx-runtime`. */
  jsxImportSourceModule?: string;
  /** An optional callback that will be called with a URL string of the resource
   * to provide additional meta data about the resource to enrich the module
   * graph. */
  cacheInfo?(specifier: string): CacheInfo;
  /** An optional callback that allows the default resolution logic of the
   * module graph to be "overridden". This is intended to allow items like an
   * import map to be used with the module graph. The callback takes the string
   * of the module specifier from the referrer and the string URL of the
   * referrer. The callback then returns a fully qualified resolved URL string
   * specifier or an object which contains the URL string and the module kind.
   * If just the string is returned, the module kind is inferred to be ESM. */
  resolve?(specifier: string, referrer: string): string | ResolveResult;
  /** An optional callback that can allow custom logic of how type dependencies
   * of a module to be provided. This will be called if a module is being added
   * to the graph that is is non-typed source code (e.g. JavaScript/JSX) and
   * allow resolution of a type only dependency for the module (e.g. `@types`
   * or a `.d.ts` file). */
  resolveTypes?(specifier: string): TypesDependency | undefined;
  /** An optional callback that returns `true` if the sub-resource integrity of
   * the provided specifier and content is valid, otherwise `false`. This allows
   * for items like lock files to be applied to the module graph. */
  check?(specifier: string, content: Uint8Array): boolean;
  /** An optional callback that returns the sub-resource integrity checksum for
   * a given set of content. */
  getChecksum?(content: Uint8Array): string;
  /** An optional string to be used when generating an error when the integrity
   * check of the module graph fails. */
  lockFilename?: string;
  /** An optional record of "injected" dependencies to the module graph. This
   * allows adding things like TypeScript's `"types"` values into the graph. */
  imports?: Record<string, string[]>;
}

The load, resolve, and resolveTypes hooks look like they were at least somewhat inspired by @GeoffreyBooth's comment above. You can tell me if I'm wrong. Anyway, this API seems like it would work almost as-is for a custom Deno loader. This is what I'm envisioning:

/** @file myLoader.ts */

export function load(specifier: string, isDynamic: boolean): Promise<LoadResponse | undefined> {
  // custom logic.
}

export function resolve(specifier: string, referrer: string): string | ResolveResult {
  // custom logic.
}

Used as:

deno run --loader=./myLoader.ts ./myScript.ts

Deno would then look at the loader file and override the default behavior depending on the functions that are exposed. I think it's already been noted that Workers aren't actually a great solution to this, even though they seem like they would be at first glance. This seems to be the simplest solution I can see, and it's already working fine for the deno_graph module.

mimbrown avatar Mar 05 '22 17:03 mimbrown

https://nodejs.org/dist/latest-v18.x/docs/api/esm.html#loaders

https://rollupjs.org/guide/en/#plugins-overview

masx200 avatar Jul 17 '22 01:07 masx200

I think Rescript will be the next big thing. Its syntax is very similar to rust, and many rust developers like it, and it is powered by Facebook and moves forward with the react ecosystem.

Leo-Mu avatar Jul 28 '22 06:07 Leo-Mu