goja icon indicating copy to clipboard operation
goja copied to clipboard

Modules support

Open mstoykov opened this issue 2 years ago β€’ 14 comments

This is a WIP PR implementing ECMAScript Modules (ESM) in goja.

The PR does add every ESM feature as of ES2022 including:

  • basic import/export syntax.
  • module namespace import * as ns from "somewhere".
  • dynamic import.
  • async module evaluation, even though the source text module do not support top level await. This definitely needs more testing as it was added fairly recently and required ... a bit of refactoring.

The WIP part of this PR comes from the fact that there are many open implementation specifics that I am not certain about and at least for some of them I would need your input, @dop251.

(For not @dop251, while I will appreciate any input, if you just want to try it out - this might be a bit hard as there is no documentation. Also, as mentioned below a lot of the APIs will change. But if you really want look at modules_integration_test.go as that likely is the best place for a small(ish) but fully working example)

Why open an unfinished PR

I have worked on this on and off since I guess December, but mostly in the last 3 months. Unforunately while I did manage to implement most stuff I just kept getting more and more questions that likely I should've asked earlier. Also, I am getting more and more other work that also needs to get done. Because of that I decided to try to get some feedback on the current implementation in hopes that will make things easier for both of us.

I will try to ask most of the questions inline/incode, this hopefully will help with threading, but we can also move any discussion in another place if you want to. For some of the questions I have - I also have ideas, which I will probably go with when I have the time even if you haven't come back to me. Also, all my links will be to my branch, because links to code in big PRs just plain don't work, and you would likely want to open them locally either way.

Again I will continue working on this, but hopefully with some help the time it takes me will be smaller. It likely also will make this easier for you to review in the end ;)

Tthere are a bunch of:

  • commented fmt.Print* lines that will be removed in any final PR.
  • TODOs coments which will also be removed as I go through them.
  • big functions that will likely be broken in smaller ones.

You can probably ignore those unless you want something to stay or want to given some specific feedback there.

I have tried to not change anything in goja that I don't need, but I still needed to touch stuff which aren't directly about modules. Hopefully all of are fine.

Really sorry for the formatting changes in some places. I have tried to bring them to a minimal as to not just move code around. I should've probably disabled gofumpt for this entire project and that would've helped more πŸ€·β€β™‚. I can probably try to remove them one final time after everything else is done.

This whole PR should be squashed in the end. Some commits are kind of useful if you need to figure out what I have tried, but most of it is likely very outdated and will definitely not be useful once we are done with this feature.

I guess first:

Are you interested in goja having ESM support in general? And do you see any fundamental issues with the approach I have taken and have not asked questions for? It's quite a big change, I will fully understand if you don't want it or, do not have time to code review it and help me right now or until I am actually of the opinion I am done.

Background on why I am working on this and a particular problem I still need to fix in k6

I specifically wanted to work on this as in k6 we do have this, I guess unfortunate, requirement that import "commonjsModule" needs to work as well as require("ESMmodule"). Cycles of both of them aren't that necessary to be supported (although I guess it will be nice if I can make them work πŸ€·β€β™‚).

Due to k6 currently using babel internally to transpile ESM to CommonJS, users have been using both and because transpiling takes time we have historically made helper modules as CommonJS :(. So this all leads to the fact that some interoperability is heavily required.

Luckily I haven't really done anything to the code design to make it possible πŸ€·β€β™‚it just kind of works on the most basic level as shown in the k6 PoC PR

That k6 PR apart from lacking a ton of tests, mostly to test cycles and interoperability, has only 2 known bugs/problems - so probably a lot more 😒.

Both open and require should be relative to the script/module that is the current "root" of execution, instead they are currently (in that PR) relative to the file that is the main script/module - the argument to k6 run this/file/here.js.

This is one of those really hard to explain with just words problems. But I basically need activeScriptOrModule, but instead of returning the module we are currently "in" (as in the code/text place), I need the one that is the root of the execution.

And yes I (now) know that require is supposed to be relative to the file it's written in just like import()... but that is literally not how it has been since k6 was made and even if we change that ... open still exists. I have opened an issue to figure out if k6 would not want to change this.

You do currently do this in goja_nodejs here, but I am not sure this works with source maps (it seems to, I guess I am wrong where the "rename" happens πŸ€·β€β™‚) but in order to get the "root" module/script I will need to go through the whole stack ... which might be big :(. I kind of feel like that is an okay thing for goja to provide. What do you think?

(Note: currently k6 can know which the currently executing script file as it is the only thing that is executing stuff, once goja starts evaluating cyclic modules this will no longer work)

Edit: I just posted all of the comments inline and noticed, while copying the messages I had prepared, that the order of the code in the PR as always does not in any way reflect how the code actually flows :facepalm:

I would recommend looking at something like modules_integration_test.go to see how the code is used and then following the code from there. I would argue also that the most important comments and questions are in modules.go and compiler.go.

Also thanks again for working on goja :bow:

mstoykov avatar Sep 07 '22 15:09 mstoykov

yummmy!!!

cdle avatar Sep 09 '22 01:09 cdle

I would definitely like to have the modules functionality (because it's part of the specs), however considering modules are all asynchronous in the current version of the specs, the logical order of implementation should be:

  1. Generators (which includes execution context, i.e. ability to save and restore the state of execution, which in turn will require changing the exception handling).
  2. async/await.
  3. Modules.

I am planning to do the generators next (but as per usual, no ETA), and eventually the rest, although the remaining 2 items are lower priority. If you (or anyone else) would like to pick them up afterwards, that would be most welcome.

As for the implementation, I only had a chance to have a quick look, one thing that strikes me is the indirection mechanism, I think it should be given some more thought. Not only it looks a bit hacky, I don't think it would work with dynamic resolution (i.e. eval("importedName"). But to stress, without 1 and 2 any implementation would be premature and would require extra work afterwards.

dop251 avatar Sep 14 '22 21:09 dop251

Hi, first thanks for the response :bow:

I would definitely like to have the modules function

Great!!!

But to stress, without 1 and 2 any implementation would be premature and would require extra work afterwards.

I would like to push back on this. In practice while the current implementation let them be asynchronous that isn't what will be the case most of the time and IMO top-level-await support can be added when async/await is added.

But ESM definitely touches a lot of stuff and each feature addfed makes this even bigger. For example when you added classes (thanks again :bow: ) I was more at dynamic imports and needed to merge the classes. I am happy to say that it was mostly just merging.

It will definitely be more work when generator and async/await are added, but I would argue probably very little compared to the rest of the things needed in that.

That is to say that I am fairly certain no big changes will be needed to the module code apart from in the particular places that this feature interact:

  1. parsing the extra modules stuff and moving it through the code
  2. some changes to how the execution of the source text module happens so it can use top level await.

But the actual support for tla from the modules perspective is there ... albeit untested and that definitely needs to be done.

And again the whole PR needs a lot more tests and if we do the refactoring of the types and names in modules.go likely quite a lot more changes as well.

I also have problems with teh current asynchronous nature of the evaluation as it makes us use Promises, which in itself is nice, but that means that we need to cast types everywhere where in reality there should be only 2 possibilities.

As for the implementation, I only had a chance to have a quick look, one thing that strikes me is the indirection mechanism, I think it should be given some more thought. Not only it looks a bit hacky, I don't think it would work with dynamic resolution (i.e. eval("importedName").

Unfortunately you are right :facepalm: and tc39 tests again don't cover this or are disabled for some other missing feature :scream: .

Also unfortunately that means that I will now need to figure out how eval resolves bindings, which I think I did have some idea at some point :(. Any suggestion on this will be nice though

For the record I do have a WIP (not a PoC) implemetnation of generators - I am at the "compile generators" part so ... likely many free afternoons away from it beign a PoC :(.

And as I have said we are intersetd in async/await and will work on it if possible (I just also know that it is dependant on generators), but also modules are the last thing we use babel for in k6 atm. And unfortunately with our current version of babel(which we really don't want to go through updating it for many different reasons) means that it supports syntax from 5+ years ago :(. Also this is kind of big change on our side as well.

But if you are absolutely possitive we need generators and async/await first I guess I will argue for pivoting some more resources (me) to implemeting those.

mstoykov avatar Sep 15 '22 07:09 mstoykov

@mstoykov I'm always quite excited when you make changes here because you've added a few great things. I have a fair bit of totally unrelated dev experience, and yet when I've thought about contributing to goja, I never quite get past the learning and absorbing hump into starting something.

You talked a bit about files and resolving paths etc. However, I'd be very very happy if this sort of thing was solved with something like the fs.FS interface. You could have a default set to nothing (security?) or to something standard, but your project would be able to pick something else. Someone out there might decide that they instead want to write their own implementation of FS or use S3 that lazy loads into memory for all I know. Someone might want everything embedded into the binary. All of this is easy to accomplish with the FS interface. I suspect you would get far more flexibility while also achiving a cleaner feeling result at the same time.

  • https://benjamincongdon.me/blog/2021/01/21/A-Tour-of-Go-116s-iofs-package/
  • https://pkg.go.dev/io/fs@master#FS

I would actually be very slightly sad to see goja get proper module support without the ability to use embed, and at the same time, I also assume this wouldn't be the default or even a very high priority, but for some it would be huge, so I very much hope fs.FS is the answer that supports 99% of all use cases without any real downsides.

PaluMacil avatar Oct 10 '22 12:10 PaluMacil

Hi @PaluMacil, sorry for the really slow reply :(.

Glad you are excited, unfortunately I am not certain it is going to go in goja any time soon as indicated by the comments above :(. I also have had a lot less time to actually work on this and related problems than expected :(.

The thing is that the ECMAScript specification is very ... vague on what module specifiers are or should be interpreted as.

The most it does it this sentence

Multiple different referencingScriptOrModule, specifier pairs may map to the same Module Record instance. The actual mapping semantic is host-defined but typically a normalization process is applied to specifier as part of the mapping process. A typical normalization process would include actions such as alphabetic case folding and expansion of relative and abbreviated path specifiers.

Which in practice tell us that specifiers:

  1. can be normalized which is 100% host defined
  2. a single specifier(+referencingScript/Module) MUST always return the same instance of a module (record)
  3. different specifiers(+referencingScript/Module) can returns the same instance of a module(record) as another specifier(+referencingScript/Module)

That basically means that ./file.js or file.js or file or /path/to/lcaol/dir/file.js might refer to the same module or they might not.

There are only 4 mentions of the word "file" in the whole specification and none of them are related to modules.

And arguably a lot of the implementations will want non file based modules. We at k6 have k6/http for example which allows you to make http requests. Node has modules such as "crypto" which let you do cryptography stuff.

I do ask in the PR description about whether goja should provide some kind of default implementation of the module specifier resolution mechanism that just defaults to using os/fs and there is even tests that more or less do that.

But making this the default behaviour IMO is not on the table - you will need to configure your goja Runtime to do this when you use it.

p.s. you can for example see a fairly complex resolution mechanism - the node one here and this is (from my memories) a simplified variant of what they do for require().

mstoykov avatar Oct 24 '22 14:10 mstoykov

hi, @mstoykov. I look forward to more exciting features in your k6!

(sync)generator, async is now available. can you restart the esmodule?

arukiidou avatar Feb 06 '23 14:02 arukiidou

@arukiidou there are plans for this to be worked on in the following weeks.

mstoykov avatar Feb 06 '23 14:02 mstoykov

@mstoykov Since you might revisit this, I wanted to revisit what I mentioned regarding using https://pkg.go.dev/io/fs#FS for configuring the module loader. When you said that you didn't think a specific implementation could be in goja itself, I kept wanting to circle back to point out that io.FS is an interface, not an implementation. If you accept the io.FS interface (io package, with just one method required: Open(name string) (File, error)), then one could use embed, a directory (os.DirFS which might be what you thought I meant originally), S3, a url, or anything else you want since it's a standard way to do this and there are options already written in or out of the standard library already. It would be maximally flexible. You could make an implementation in another repo that mimics webpack or systemjs. You could use something from the standard library out of box, or you could use myriad other options even if they don't mimic something from the node or web world.

PaluMacil avatar Feb 09 '23 14:02 PaluMacil

Just wanted to ping you @dop251.

I have removed the not-quite-yield hack I had and have made eval working as far as I can see.

Also top-level-await is also working.

AFAIK any modules features currently is working or .,.. I don't know about it :sweat_smile:.

The code definitely needs way more work. But I want to touch bases on what direction the current base should be moved into.

On my "short" TODO list are:

  1. realign the current implementation more with the current specification. While the code works, the specification changed twice over the time this has been worked on, so definitely some things are now called differently or are slightly differently explained. One of those changes helped me a lot as it specified more clearly how a thing should work, I expect other such will be also helpful. I just haven't had time to go through the code from one end to the other to check everything.
  2. I do think the current exportIndirect and co. are not great - they were what I managed to do over a year ago, understanding way less than I do now. Unfortunately even now I would argue I do not have a much better way to do it that probably won't be even more performance degrading. Arguably for only source modules we can directly reference the stash of another module. Any ideas or pointers will be great.
  3. The API that is exposed to the end user evolved ... uh ... naturally aka it is pretty crappy. Forget that it doesn't have any documentation (which also is a problem) it is way worse then before as it now have even more promises everywhere. Considerable amount of the work was to try to keep the error path to always resolve with an actual error instead of maybe an error maybe something else. The plan here is that I probably should be wrapping more of an object with function fields which will be way easier to work with then a bunch of interfaces. But more importantly won't require the user to implement multiple interfaces where most of the methods return false/nil or something similar.
  4. Every inline TODO

Any feedback is welcome, I am writing this now, as I will be away for a bit leaving you with some time and also managed to get everything in the working but pretty ugly state.

There is a newer k6 PoC PR that hacks around some stuff and can be used as an example (on top of the integration tests) of how this should be used until I write documentation.

p.s. to everybody else - sorry for not coming back sooner, but making this work seemed more important than discussing the final API. Hopefully after a few rounds with @dop251 we will get to your points as well.

mstoykov avatar Nov 16 '23 17:11 mstoykov

Thanks for working on this. Unfortunately I'm rather time constrained at the moment, and it's a big PR to review, but I'll get to it as soon as I can. It would be nice to close this final gap so that ES6 is fully supported.

dop251 avatar Dec 14 '23 10:12 dop251

Its been a couple months now, what's the status of this?

gabereiser avatar Feb 18 '24 16:02 gabereiser

when can mr?

godLei6 avatar Mar 01 '24 15:03 godLei6

I am also wondering about when this PR will merge

zakiRefai avatar Mar 14 '24 16:03 zakiRefai