core icon indicating copy to clipboard operation
core copied to clipboard

Federation is loading singleton share sometimes multiple times (concurrency issue)

Open foxylion opened this issue 8 months ago • 17 comments

Describe the bug

Reproduction

We have a complex application sometimes initializing multiple remotes using loadRemote of the enhanced module federation at the same time. Those remotes often share the same dependency that is not yet loaded from a remote.

Due to concurrency issues within the module federation, the share is (sometimes) loaded multiple times and some of the remotes will then use another copy of the share. This will then lead React hook errors down the road (not part of the reproduction).

What I see

The parallel reproduction example shows the bug by loading 10 very similar remotes (only the dev server port and name differ) at the same time. The browsers’ developer tools show that shares are loaded multiple times (see README for details).

What I expect

The sequential example in the reproduction does show how I would expect the shares to be always loaded (no duplicates).


I'm happy to provide any additional reproduction if required.

Reproduction

https://github.com/foxylion/module-federation-share-duplicate-repro

Used Package Manager

pnpm

System Info

System:
    OS: Linux 6.11 Ubuntu 24.04.2 LTS 24.04.2 LTS (Noble Numbat)
    CPU: (20) x64 12th Gen Intel(R) Core(TM) i7-12700H
    Memory: 6.91 GB / 31.02 GB
    Container: Yes
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 20.18.0 - ~/.nvm/versions/node/v20.18.0/bin/node
    Yarn: 1.22.22 - ~/.nvm/versions/node/v20.18.0/bin/yarn
    npm: 10.8.2 - ~/.nvm/versions/node/v20.18.0/bin/npm
    pnpm: 9.15.4 - ~/.nvm/versions/node/v20.18.0/bin/pnpm
  Browsers:
    Chrome: 134.0.6998.88

Validations

foxylion avatar Mar 13 '25 16:03 foxylion

there is no init() - register remotes adds one to the map but there is no init() so sharing is never initialized across them. Thats most likely the issue. https://module-federation.io/guide/basic/runtime.html

ScriptedAlchemy avatar Mar 13 '25 19:03 ScriptedAlchemy

Must init() be called on the host and/or remote?

If the missing init is the problem, Why does it work as expected when I call loadRemote() sequentially on the remotes, but not if I call it in parallel?

foxylion avatar Mar 13 '25 20:03 foxylion

I now added init() calls to host and remote, it did not change anything regarding the duplicated share loading.

I also tried to move the remote registration to the rsbuild.config.ts and remove the registerRemotes() calls from source, but this did also not help. It can be seen as commented lines in the host/rsbuild.config.ts.

I also tried using only a simple shared definition like shared: ["react", "react-dom", "react-dom/client"], issue is also unchanged.

foxylion avatar Mar 13 '25 20:03 foxylion

init is called in anyone who consumes remote modules. if you have host->remote, then host needs init()

if you have 3 containers then it would be like this host(init)->remote(init)->remote2

anyone importing remotes has to init.

I also see you do not have anything in the shared object of the plugin, so that will also have a impact.

You can reference the cra eample which uses rsbuild as well: https://github.com/module-federation/module-federation-examples/tree/d7909ea7e29e1da671e8703705020abdca647d7e/cra

ScriptedAlchemy avatar Mar 14 '25 03:03 ScriptedAlchemy

while im injecting the plugin into rspack, thats not needed - you can use the rsbuild plugin now. but the important part is https://github.com/module-federation/module-federation-examples/blob/d7909ea7e29e1da671e8703705020abdca647d7e/cra/host/modulefederation.config.js

ScriptedAlchemy avatar Mar 14 '25 03:03 ScriptedAlchemy

Looking at your repo, it seems that both apps do not opt into sharing.

Sharing requires both apps to agree to sharing modules, must be listed in the plugin. Currently only your remote has shared listed. So the remote doesnt see anyone else has shared modules since its the only one who has them listed, thus everyone loads their own

ScriptedAlchemy avatar Mar 14 '25 03:03 ScriptedAlchemy

Must init() be called on the host and/or remote?

If the missing init is the problem, Why does it work as expected when I call loadRemote() sequentially on the remotes, but not if I call it in parallel?

Because the remotes are initializing in parallel but probbably not registered upfront. Federaiton works by linking all containers together, making them agree on who will supply what dependency and where they are, there are edge cases in dynamically loaded remotes, because 1 will not "see" 2 and it will initialize with what it has in scope at that point.

This causes share scope to tear in some cases, because the former cannot see the latter and since someone already is using "react" it cannot be unlinked from memory or others may not see them.

so generally we suggest init(allKnownRemotes)

then later on in the app loadRemote()

then if theres very dynamic cases, use registerRemote - but we cannot guarantee share scope will maintain its integrity since the objects are sealed in the already running containers

Lets say you have Container 1,2,3

Homepage loads Container1/home you click on the about page, which loads Container2/about - this one has lodash shared then you click back to something else who imports container1/otherpage, and it also uses lodash

Container1 cannot see lodash from container 2 because it was already initialized and since container2 wasn't in the scope at initialization time, container1 has fewer shars in it than container 2

Lets take another example

container1->container2->container3 - all loaded lazily and no upfront init. Lets assume each container adds 1 unique shared module to the share scope.

heres what the containers will have in their sharescope object

container1(1 shared key)->container2(2 shared key)->container3(3 shared key)

if you use the plugin with the remotes or list the remotes and call init upfront, it would look like this

container1(3 shared key)->container2(3 shared key)->container3(3 shared key) - because everyone else can shake hands with everyone and agree together on whats available.

Loading them in parallel likely causes a race condition because we do not know about the others.

You can also look at the runtime-plugins folder in federation example repo, those can help you can manual control over things if you need it, for example forcing remotes to use share keys from a specific container.

ScriptedAlchemy avatar Mar 14 '25 03:03 ScriptedAlchemy

First, thanks a lot for your detailed response, I'll have a detailed look into the examples as well. Some of your statements may need some clarification as I do not understand it completely.

so generally we suggest init(allKnownRemotes)

I thought I already understood this, but calling init(allremotes) at the beginning does not seem to help here. As long as the loadRemote(..) calls (and waiting for their promises) do overlap in time, there is a risk that one remote does not resolve the same shared dependencies as the others.

Here is a new, simplified example: https://github.com/foxylion/module-federation-share-duplicate-repro/tree/shared-host-and-remote

I added the sharing of react, react-dom and react-dom/client to the host (but as the host does not need it, it will only be loaded when one of the remotes requests it). I also removed the whole runtime initialization by moving the remotes config to rsbuild.config.ts.

With this changes applied (and being in-line with the example you provided here https://github.com/module-federation/module-federation-examples/blob/d7909ea7e29e1da671e8703705020abdca647d7e/cra/host/modulefederation.config.js) I still the the issues described above.


What I can see in the Browser console is that even when using init or the build configuration of remotes, the mf-manifest.json is only loaded when I do the first loadRemote call. So the statement

if you use the plugin with the remotes or list the remotes and call init upfront, it would look like this

container1(3 shared key)->container2(3 shared key)->container3(3 shared key) - because everyone else can shake hands with everyone and agree together on whats available.

does not really work out, because upon calling loadRemote the host does only know about a name and a URL, not about any shares the remote might have and need to be coordinated with other remotes.


Maybe it also helps if I highlight how our architecture in our current application is working:

  • We have a host application as some kind of "app shell", it knows about all the other remotes and can either use the remote definition in rsbuild.config.ts or at runtime using init().
  • We have ~30 remotes that bring things like top level buttons, pages, etc. to the app shell
  • There is currently a flat host → remote hierarchy, no remote will itself also have remotes
  • The host does share some dependencies, the remotes share some more dependencies we see used frequently across our remotes (but not in use in the host)
  • Sometimes we will (due to a navigation change or similar) render multiple remotes for the first time in the same render cycle.
  • Doing so will cause multiple remotes to load at the same time
  • Most of the time the remotes will load shared dependencies only once, but sometimes if the initialization of the remotes overlaps too much, we will have multiple instances of a singleton share.

The example I created is a bit extreme, but it is more or less what we currently have in our setup.

foxylion avatar Mar 14 '25 08:03 foxylion

hmm so its a "in flight" problem. let me speak to @2heal1 - he knows the runtime in more detail

ScriptedAlchemy avatar Mar 14 '25 09:03 ScriptedAlchemy

@foxylion you dont have this issue with import from (plugin based loading)?

ScriptedAlchemy avatar Mar 14 '25 09:03 ScriptedAlchemy

Ahh i see! because your host doesnt use react, this means there is no initial share scope to have. Can you make the host "use" react so theres a common point to grab it from and its not just a bunch of remotes?

ScriptedAlchemy avatar Mar 14 '25 09:03 ScriptedAlchemy

Ahh i see! because your host doesnt use react, this means there is no initial share scope to have

Yes exactly, for sure we could use/load react on the host, but the example is only small. We expect to share a lot more dependencies and hoped to only load them when really needed.

Most of the dependencies are only used on some remotes (let's say 10 out of 30).

Sharing would be used to optimize the total downloaded JS size across all bundles. But if we pre-load all shared dependencies, we would load a lot of JS code that is probably never used (because the user did not navigate to any route using a remote relying on the share).

I hoped there is either

  • an approach to ensure concurrent share loading request does not result in duplicate share loadings (e.g. a remote should not start to load a share if it can see that another remote has already started to load the same share, as far as I understand it the __FEDERATION__ variable has already all the needed information available)
  • an approach to limit the sources where the share could be loaded from to a single container (if this is somehow possible)

The first approach would be "better" as the federation would handle our concurrency issue out of the box.

foxylion avatar Mar 14 '25 09:03 foxylion

Can you make the host "use" react so theres a common point to grab it from and its not just a bunch of remotes?

I think I understood it now better. Only sharing "something" it triggers the initialization of federation shared scopes.

https://github.com/foxylion/module-federation-share-duplicate-repro/tree/shared-host-and-remote-fixed

This example now works in regards to having no duplicated share loadings. But we are now seeing 20 additional requests during an application load, even if no remote is yet needed.

In our production case this would be addition 60 requests (30 remotes), totalling to ~2.4 mb uncompressed but minified data transfer.

In an ideal implementation I would hope the mf-manifest.json and /static/js/remote_xyz.js files would only be downloaded when the remote is used (loadRemote() is called).

foxylion avatar Mar 14 '25 11:03 foxylion

Gotcha - so remotes do have the external runtime experiment which will reduce the size substantially in the remotes. That said, if you really want to lazy load the container files themselves, theres some options to workaround this for now.

  1. add react to the host and import it somewhere so that host thinks its used, then remotes can lean on hosts react in this case to vend from.
  2. use a custom runtime plugin and the loadShare or resolveShare hook (check runtime plugin folder in example repo) - then you can loop over the share scope and redirect dependencies to a specific remote, for instance in next.js i force the remotes to always use the hosts react and next shares by returning federation__.instances.shared[react] when the request is react - effectively you can replace our internal algo with your own mechanics.
  3. load 1 remote first to kickstart it or use loadShare(react) ahead of time so that you can fetch a share before loadRemote takes place, and help the system see that module ahead of time
  4. use a server or edge network to fetch and join all the remoteEntry.js contents into 1 payload / script like static/assets/allRemotes.js - which will reduce number of requests.

We will look into the condition you raised but my team has some other priorities to address so it wont be till next week some time before we can investigate it in more detail.

  1. manifest.json can be run on server to calculate the module snapshot: you can see a reference here https://github.com/2heal1/module-federation-incorrect-version-resolution/blob/chore/use-enhanced/server.mjs

look at the runtimePlugin too - we regiser a module snapshot which allows us to precompute the lockfile more or less, then the client does not load the json file since the server already calculated it and we dont need to recompute it in the browser on demand - this is how we do it at Bytedance.

ScriptedAlchemy avatar Mar 14 '25 19:03 ScriptedAlchemy

Thanks again for the detailed response.

We have now for the moment setteled on pre-loading all remotes (only mf-manifest.json and <remote-id>.js, nothing else) in addition we switched on relying on the host federation instance. This reduced the overall code size and with HTTP2 the loading impact is only minimal on good network connection.

In the future we might think about your second idea in combination with a single source for all the shared dependencies. But maybe we do not need to, if the race condition gets fixed at some point? ;-)

But also your last approach might work in the future for us when we change the way we deploy and serve your remotes (what is on our future roadmap).

foxylion avatar Mar 18 '25 20:03 foxylion

We will look at race condition, just have some other things backlogged right now for company

ScriptedAlchemy avatar Mar 18 '25 20:03 ScriptedAlchemy

try setting shareStrategy to "loaded-first" instead of "version-first"

ScriptedAlchemy avatar Mar 22 '25 01:03 ScriptedAlchemy

Thanks a lot @ScriptedAlchemy, this did really the trick.

For the reference (if anyone else has a similar challenge) this is a working solution: https://github.com/foxylion/module-federation-share-duplicate-repro/tree/fixed-with-loaded-first

foxylion avatar Mar 31 '25 08:03 foxylion

Yeah version first seems to have a race condition in this case. We will look at it

ScriptedAlchemy avatar Mar 31 '25 08:03 ScriptedAlchemy