node-chakracore icon indicating copy to clipboard operation
node-chakracore copied to clipboard

Need to add support for es modules

Open MSLaguana opened this issue 8 years ago • 14 comments

Node 8.5.0 is adding support for es modules, but the approach they are taking using v8 APIs does not fit well with ChakraCore's approach to modules.

In particular, ChakraCore combines parsing a script and linking a module with its dependencies into one JSRT call, while node currently relies on the ability of v8 to separate them, so that it can parse a module script, extract the dependencies, prefetch them, and then link them ~synchronously.

One solution to this might be speccing out napi support for es modules and using that within node.

MSLaguana avatar Sep 12 '17 18:09 MSLaguana

JS spec has instantiate as a synchronous call separate from parsing. I'm not sure how the approach should differ? Maybe some links to APIs would help me.

bmeck avatar Sep 18 '17 19:09 bmeck

This would seem to be an issue on Chakra-Core's end? I'm sure @bmeck can correct me if wrong but I'm pretty sure we need them to be separate to import CJS inbetween ESM trees correctly?

Fishrock123 avatar Sep 18 '17 19:09 Fishrock123

@Fishrock123 they don't have to be separate, but we would need additional hooks and loading would block main thread while you fetch graph it looks like.

bmeck avatar Sep 18 '17 20:09 bmeck

@boingoing can you help point @bmeck to the right APIs here? I think it's this API but I don't think we've documented them yet: https://github.com/Microsoft/ChakraCore/blob/master/lib/Jsrt/ChakraCore.h#L141

cc @liminzhu

digitalinfinity avatar Sep 18 '17 21:09 digitalinfinity

I'd be interesting in helping with this; if help from an external contributor is welcome.

I'm now quite familiar with ChakraCore's module loading APIs and internals.

If useful here's how it works, roughly. Parsing modules with CC:

  1. CC notifies host to fetch the module
  2. Host must check if the module has been loaded already or not 3a. If module not yet loaded, host creates a new module record OR 3b. If module already loaded host returns existing record
  3. When there are pending modules with records that are not yet parsed host should call JsParseModuleSource, logic of Parse module source is non-simple so some pseudocode (the real code looks rather different but here is the basic logic):
function ParseModuleSource(module)
{
  ParseModule(module);
  var dependencies = identifyDependencies(module);
  if(areParsed(dependencies))
  { 
    if(isRootModule(module) || isDynamiclyImportedModule(module))
    {
      ModuleDeclarationInstantiation();
    }
    else
    {
      var parents = identifyParents(module)
      for(var i = 0; i < parents.length; ++i)
      {
        NotifyParent(parents[i], module);
      }
    }
  }
  else
  {
    NotifyHostToFetchDependencies(dependencies);
  }
}

function NotifyParent(parent, module)
{
  var dependencies = identifyDependencies(parent);  
  if(areParsed(dependencies))
  { 
    if(isRootModule(parent) || isDynamiclyImportedModule(parent))
    {
      ModuleDeclarationInstantiation();
    }
  }
}

Key point to see is that you don't have to load and parse the whole dependency graph in one go - ModuleDeclarationInstantiation gets automatically called when the last module in the dependency graph has been parsed not every time you call ParseModuleSource. 5. When all modules are parsed and MDI has occurred the host has to call JsModuleEvaluation

rhuanjl avatar Jan 25 '18 18:01 rhuanjl

Once we do get our support for modules into a state that node-chakracore can use, we need to re-enable a number of tests in test/parallel. Some new tests of vm support for modules have had to be skipped due to lack of support in node-chakracore.

MSLaguana avatar Feb 09 '18 03:02 MSLaguana

This support is probably going to need both Jsrt-side and Node-ChakraCore-side changes. @rhuanjl your understanding looks right to me. The first step to getting ESM working in Node-ChakraCore is probably to understand exactly where in that process ChakraCore differs from v8.

boingoing avatar Feb 09 '18 19:02 boingoing

I didn't have anything better to do so I tried to work out how v8's module implementation works and how node implements it. I could't find any good documentation for it; this seems to be the best there is: https://v8.paulfryzel.com/docs/master/classv8_1_1_module.html the actual code was probably better.

I'm not 100% positive that I've got this right I found the v8 code somewhat confusing - possibly just because it's so different to the CC code I was trying to compare it to.

Module loading in node with v8

  1. Resolve the Url and Fetch the module
  2. Create a module object and Parse the source with v8 method ScriptCompiler::CompileModule
  3. "Link" the module - this entails: a) Using v8's method GetModuleRequestsLength() to find the number of dependencies b) Running a loop using v8 method GetModuleRequest(i) to get the specifier for each dependency and then await a "ModuleJob" for it - a "ModuleJob" is a function returning a promise that is resolved once the module has been Fetched, Parsed and Linked.
  4. await the resolving of all ModuleJobs
  5. Call v8 method module::Instantiate(context, callback) - this method blocks until the module is instantiated - the callback is used to fetch dependent modules and is expected to synchronously return them already Parsed.
  6. Evaluate the module.

There are a few other bits and pieces going on - e.g. the namespace of the module is also provided via GetModuleNamespace() - but this is the bulk of the process (assuming I've understood it correctly)

How to do this with CC? At first glance it looks quite difficult to replicate the same behaviour as this with CC without being able to edit the node code as the interface for module loading in CC is totally different - not just within Jsrt but in the underlying logic in SourceTextModuleRecord.

Hurdles to overcome:

  1. Small issue: Jsrt has no equivalent of GetModuleRequestsLength(), GetModuleRequest(i) and GetModuleNamespace() - all fairly easy to implement.
  2. Larger issue: ParseModuleSource in CC will fire off a callback expecting to have the host return a module record - this module record does not require to have any source at this point BUT CC will expect this module record to be used later when the Source for that dependency is parsed, I think this callback would need to be set up in chakra-shim and would need to create the module record and store it in some kind of table of module records retained in chakra-shim to make this work.
  3. Node expects instantiation to be done by calling a different function once all modules are parsed - CC will fire it off when there are no pending modules. We could block this by making a dummy module in chakra-shim and setting it as a child of all the other modules - then the equivalent of Instantiate will be to call ParseSource on the dummy.

rhuanjl avatar Feb 11 '18 23:02 rhuanjl

I think I've worked out how to do all of this with no further changes to CC other than the JsGetModuleNamespace() function which is now in master.

The behaviour underneath will not be the same - BUT the observable behaviour should be.

I'll see if I can get it to work using current chakracore master will likely take me a little while - code i need to write for the shim will be slightly involved.

Update I'm running up against the fact that I'm not actually that good with C++ here, if this was C I could do it... I know what I need to implement but realistically will take me a while to do in C++ as I need to work out how to do various things in C++ appropriate ways, so may be better if someone else does this.

From having read all of the relevant source and worked through the logic. The below steps if implemented via Chakra-Shim should serve to make CC emulate v8's behaviour on the necessary points:

Steps needed:

  1. Need the v8::Module class to include the following properties a. JsModuleRecord - the CC module record b. int - number of dependencies c. string (of some kind) Normalised specifier d. array of specifiers for dependencies e. status per the v8 module status enum f. a "WasParsed" boolean
  2. The v8::Module constructor would need to: a. use JsInitializeModuleRecord to create a module record b. set the number of dependencies to 0 c. set the Normalised specifier d. initialise the array of dependency specifiers as an empty array e. set the v8 status to kUninstantiated f. Set the WasParsed boolean to false.
  3. Need a map/lookup table of modules created so far available at the Isolate level or simply globally - this would be akin to the Map used in ch, Normalised Specifier keys and v8::Module values.
  4. Need to normalise specifiers within ChakraShim - as the timing of module record creation differs between the v8 and CC implementations Node currently normalises the specifier too late for CC- so the shim will have to do it too.
  5. FetchDependentModuleRecord call-back will need to: a. Normalise the specifier - need to duplicate node's methodology for this b. Check the map to see if this module has already been created c. If no create a new v8::Module and add it to the map d. Add one to the number of dependencies value for the parent module e. add the specifier to the parent module's array of specifiers
  6. ScriptCompiler::CompileModule will need to: a. Check the map to see if the module exists (unless it's a root module it should) b. If no (i.e. if it's a root module) create it c. If the FetchDependentModuleRecord call back is not yet set set it - only has to be done once per ScriptContext so could be done on start up or could be done specifically when we hit a root module here. d. Check if the WasParsed property is true, if yes return. e. If no: If we want to have the Url in stack traces use JsSetModuleHostInfo to set the normalised specifier as the Url f. Call JsParseModuleSource on the module record g. set the WasParsed property to true
  7. v8::Module::GetStatus would return the v8status
  8. v8::Module::GetException would use JsGetHostInfo to fetch an exception object
  9. v8::Module::GetModuleRequestsLength would return the value of the int dependencies property
  10. v8::Module::GetModuleRequest(i) would return members of the array of dependency specifiers
  11. v8::Module::Instantiate would simply set the kInstantiated property to true - it shouldn't need to do anything else - could possibly check that the module and all its dependencies had true for their "WasParsed" property and throw an exception if not. Note the callback that node passes to it should be able to be safely ignored.
  12. v8::Module::Evaluate would check that the status was kInstantiated and if yes call JsModuleEvaluation then set the status to kEvaluated
  13. v8::Module::GetModuleNamespace would map to JsGetModuleNamespace
  14. The NotifyHostModuleReady callback shouldn't need to do anything as node is going to call v8::Module::Evaluate on its own after calling v8::Module::Instantiate. It could be implemented as a simple return JsNoError - it could alternatively be used for a sanity check of some kind.

@boingoing I hope the above is helpful

Extra point: this suggestion if implemented in CC would enable removing a few steps from above: https://github.com/Microsoft/ChakraCore/issues/4740

rhuanjl avatar Feb 23 '18 18:02 rhuanjl

Spent some more time thinking on this - the above plan if implemented should work for supporting the "default" module loading implementation in Node.

But it would not support the vm.Module class at all: https://nodejs.org/api/vm.html#vm_class_vm_module

To make CC support that would I think require exactly emulating v8's api - which would have to involve logical changes in SourceTextModuleRecord.cpp, not just Jsrt changes - could potentially set up 2 different module loading paths in CC with a flag of some kind to set which one is used - feels like ugly codebase bloat to do it... May actually be easier to do than the hacky 80% emulation process I wrote out above though...

rhuanjl avatar Feb 28 '18 22:02 rhuanjl

@rhuanjl is there are an API Node could be using instead of V8's that would be abstract enough to work in both?

bmeck avatar Feb 28 '18 22:02 bmeck

(Note I'm not a microsoft employee just an external contributor with too much time on his hands)

@bmeck I'm sure one could be devised - not sure I'm the best person to do that - I've outlined the key point below as a start.

@boingoing please can you take a look at this and comment?

The key difference is when the two engines use a callback to set up references between parent and child modules - they do it at opposite ends of the process.

1. V8 ScriptCompile::CompileModule creates an array of specifiers for the children which the host can access but does no other linking.

Module::Instantiate has a callback for each child which requires that the host returns the module records (which it is assumed they will have loaded and parsed via ScriptCompile::CompileModule using the previously provided array as a reference).

2. ChakraCore Has a function called JsInitializeModuleRecord - this does not Parse any Source it simply creates a "blank" moduleRecord object which you would later supply when calling JsParseModuleSource.

JsParseModuleSource (the equivalent of ScriptCompile::CompileModule) has a callback function for each child - called with specifier and parent module parameters - which requires the host to return a moduleRecord object for the child - it doesn't have to be Parsed - it can be a newly created one from JsInitializeModuleRecord.

The records supplied to the callback function have to be the records you later supply to JsParseModuleSource when parsing the source for each of those children.

(CC's Instantiate is internalised - it keeps track of the referencing based on the records supplied in JsParseModuleSource callbacks and when all the records in the tree have been Parsed it performs Instantiation - providing no callback at this stage)

"Simple" reconciliation To have chakra-shim fake v8's behaviour my suggestion above is having chakra-shim resolve the urls for the child modules itself and maintain a map within the shim - then when a press source job for a child module is handed to ScriptCompile::CompileModule it should be able to check the url that's supplied (which node will have resolved using the same method) against the map in order to use the correct pre-existing blank moduleRecord object.

This will be rather messy (see outline above of required steps) and only works if chakra-shim can know exactly how urls will be resolved from specifiers BEFORE the first parent module is parsed.

rhuanjl avatar Mar 01 '18 00:03 rhuanjl

Alternative answer, in this commit: https://github.com/rhuanjl/ChakraCore/commit/9e1323e515bedb9f07f97adc2a1c5101506ef0b2 I've had a go at adding v8's module loading pathway to CC as an option so the host can choose the current way or the v8 way which should enable fairly implementation in chakra-shim.

It's not as bad as I thought it would be if you ignore the comments it's only about 140 lines added to CC itself (and a similar amount in ch). I'll try and get this into fit shape to actually be used shortly - the ch implementation in particular is not good enough yet (it currently passes 35/36 of CC's module tests)

rhuanjl avatar Mar 02 '18 09:03 rhuanjl

I've now got a fully working alternate module API for CC that basically mirror's v8's - see https://github.com/Microsoft/ChakraCore/issues/5240

If the CC team are willing to accept it in I can implement ESModule support in node-chakracore with it.

(Other alternative I can see would be to swap around here i.e. have node use an abstract API based on CC's API and emulate it with v8 - as v8's api is much lower level than CC's)

rhuanjl avatar May 28 '18 00:05 rhuanjl