core [runtime] Manifest fetch/logger error can throw unhandleable errors

Describe the bug

Issue

When the MF runtime attempts to fetch any remote registered with an mf-manifest.json, if it fails, an exception is thrown. Today, there is no hook for handling this. A hook exists to provide the fetch itself, .catching that fetch does not resolve the issue here at all since other internal logic still "logs" the error leading to the exception being thrown regardless.

Depending upon what triggered the runtime to attempt to attempt the fetch, it might even not be something we can catch at all. In this recreation project, I am triggering the manifest fetch manually with a preloadRemote call. However, in our more complicated internal project (that I unfortunately cannot share here) even just shared dependency loading can lead to remote manifests getting fetched and throwing an error we cannot catch and handle. This happens in the snapshot plugin even if nothing is done with those remotes yet. This leads to the MF runtime throwing exceptions beyond our control that crash portions of our app like routing which can affect routes unrelated to the unreachable remote when any remote is unavailable at all for some reason.

Cause

Any call to SnapshotHandler.getManifest() will cause an unhandled exception to get thrown that we potentially have no way to handle. This can be triggered by preloadRemote, but also by the MF runtime itself when, for example, ShareHandler.initializeSharing() is called. That can trigger initRemoteModule() and eventually getManifest().

While the fetch within getManifest() is itself handled with a try/catch, it calls an error(msg) logging method that then rethrows an error with the logged message. I think this is really the root of the issue here tbh.

Why is this a problem? Well because often times routing can trigger dependencies to be imported. The error thrown here disrupts that process and causes at least @tanstack/react-router to fail on any route change even those not obviously related to the problematic remote in any way.

Links to referenced methods:

Potential Solution

We need a way to handle any errors thrown by the getManifest fetch before it calls error(msg) or really just exceptions thrown by error(msg) function in general. Anything that could log an error through that error method might result in this issue to be honest. (Does this need to throw the message instead of just log it as an error?)

Throwing errors are fine, as long as we have a path to hook into handling them. An errorLoadManifest hook might fit in line with errorLoadRemote? Alternatively just a handleError hook for dealing with errors that get logged via error(msg) instead of having that rethrow them no matter what.

Potential Alternatives

errorLoadRemote does not help here. This gets thrown separately from that hook regardless. I included the common example implementation for handling issues with that plugin hook to showcase it not helping here.
This is not a CORs issue. If a remote is unavailable for a moment, we can't have the entire site potentially throwing errors
We could potentially add some custom logic to error handling in our router more globally that tries to catch errors thrown up from dependency resolution it inadvertently triggers, but it'd be difficult to reliably discern what is a generic error being thrown by the MF runtime and how to handle it properly from there. Primarily I think the issue with this path is that in routers there isn't a way to tell it the error is OK to ignore. Only a way to handle what to render to inform the user it has happened.
A similar thread suggested a plugin that adjusts shareStrategy fixes this. It fixes a similar issue, but will not help handle this kind of exception getting thrown at all.

There are a few other past closed issues I read through on this that were either only tangentially related or never really uncovered the heart of this particular issue. If I did miss one that calls out a way to handle this kind of edge-case though please let me know and I'll review to confirm what fix/workaround/implementation adjustment works.

Reproduction

https://github.com/hrmcdonald/mf-runtime-fetch-manifest-issue

Used Package Manager

npm

System Info

System:
    OS: macOS 14.7.2
    CPU: (12) arm64 Apple M2 Max
    Memory: 85.31 MB / 32.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 20.18.0 - ~/.nvm/versions/node/v20.18.0/bin/node
    npm: 10.8.2 - ~/.nvm/versions/node/v20.18.0/bin/npm
    pnpm: 8.15.9 - ~/.nvm/versions/node/v20.18.0/bin/pnpm
  Browsers:
    Chrome: 132.0.6834.84
    Safari: 18.2

Validations

[x] Read the docs.
[x] Read the common issues list.
[x] Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
[x] Make sure this is a Module federation issue and not a framework-specific issue.
[x] The provided reproduction is a minimal reproducible example of the bug.

Jan 21 '25 18:01 hrmcdonald

let me check

Jan 23 '25 08:01 danpeen

I believe we have just hit this as well. Our Nx monorepo loads remotes at startup via init(), but if any remote is misconfigured / unavailable then we hit getManifest(), which throws, and there doesn't seem to be a way to catch the error so our whole app crashes.

Jan 31 '25 12:01 DGAISmith

i think we should let the errorLoadRemore hook catch this error. and after that we could do something to prevent this error crashes the whole app.

Feb 05 '25 03:02 danpeen

i think we should let the errorLoadRemore hook catch this error. and after that we could do something to prevent this error crashes the whole app.

Yes, that makes total sense if that is the path y'all want to take. I just didn't want to be presumptive in the original issue. Just having some way to handle errors thrown here would be great.

Feb 05 '25 14:02 hrmcdonald

Hi @hrmcdonald and @DGAISmith,

I'm pleased to inform you that we've merged the PR addressing this issue. We've also published a comprehensive guide on handling remote rendering errors, which you can find here: https://module-federation.io/blog/error-load-remote.html

We're planning to release an official version today that includes these improvements. The blog post details various strategies for handling remote rendering failure scenarios, which should help address the challenges you've encountered.

Once the official release is out, you'll be able to implement these error handling strategies in your application. I encourage you to try them out and let us know how they work for you.

Your feedback and suggestions are always valuable to us as we continue to improve Module Federation. Please don't hesitate to share your thoughts or reach out if you need any assistance.

Best regards.

Feb 07 '25 04:02 danpeen

Awesome news, the new blog post is great to see as well! We'll take a look when this update is published. Thanks again for looking into and resolving this one so quickly 🎉

Feb 07 '25 06:02 hrmcdonald

Looking into this some more today @danpeen. I think one problem that we still are hitting here is not being able to basically ignore a failed manifest load even when it is not affecting the current route.

See the assert at this line: https://github.com/module-federation/core/blob/4ef21d2a84a500ad1bb8fa71f3901de9b062f96c/packages/runtime-core/src/plugins/snapshot/SnapshotHandler.ts#L336

This still throws an error that isn't handled since the result from the errorLoadRemote hook is not a valid manifest object.

The challenge here is that this can happen even if nothing on the route here needs anything from that remote. So in effect, if any manifest cannot be loaded routing can break completely.

The only workaround I can think of at the moment is to have the errorLoadRemote hook return a fake mocked manifest object to trick the assert into thinking the manifest loaded successfully and prevent it from throwing an error (which probably is a bad idea anyways). However to do that at all, you have to know ahead of time the name of the remoteEntry to mock....which is not possible.

So unfortunately, I still can't really see away around this issue at the moment. There is still an error thrown when a manifest cannot be fetched successfully (now because of the assert) that we have no way to suppress or catch.

Any suggestions for how to deal with this?

Feb 15 '25 23:02 hrmcdonald

Any thoughts on this @danpeen or anyone else? There still doesn't seem to be a way to handle any module import/resolution throwing a potentially uncatchable error when even an unrelated remote manifest is unreachable?

Mar 06 '25 16:03 hrmcdonald

Hi @hrmcdonald , you can save your manifest to localstorage in the fetch hook whenever your manifest is reachable, and next time if the manifest is unreachable, you can return the manifest you cached last time in the ** errorLoadRemote** hook . something like this:

yourRuntimePlugin: ()=> FederationRuntimePlugin = function () {
            name:'your-runtime-pugin',
            async fetch(manifestUrl, requestInit) {
                let res;
                try {
                    const resJSON = await fetch(manifestUrl, {
                        ...requestInit,
                    });
                    res = await resJSON.json();
                    await LocalStorage.setValue({
                        key: manifestUrl,
                        data: res,
                    });
                } catch (e) {
                    console.log('fetch manifest failed:', res);
                }

                return fetch(manifestUrl, {...requestInit});
            },
            async errorLoadRemote(args) {
                console.warn('errorLoadRemote:', args);
                
                if (args.lifecycle == 'afterResolve') {
                    const cachedManifest = await LocalStorage.getValue({
                        key: args.id
                    });
                    return cachedManifest;
                }
           }
}

Mar 27 '25 03:03 ra1nj

I'm having the same problem, although mine uses the build plugin. With the latest updates I am able to catch failures loading remoteEntry and manifest files, but for the same reasons as above, it's hard to find a graceful solution when any of the manifests fails to load. In our case, a failed manifest load breaks the entire app. I can hack together a simulated manifest that keeps the pages that don't use the problematic module from crashing, but what I really need is a way to fall back to static content, or a React component or something that isn't fetched from the remote. It seems this can be done with remoteEntry.js files, but not manifests

Mar 27 '25 19:03 nickhall

@ra1nj yeah there are ways to workaround the issue by either significantly delaying ModuleFederation init (attempting to fetch a manifest and confirm a 200 response before even registering it with the runtime) or by working around it after a failed load (sort of like you are suggesting). Even that solution doesn't super help because sometimes if the manifest is not reachable, an old copy of it is likely to not help either since the assets at that location are likely not reachable either.

After understanding Module Federation a bit more. I think the reason this is an issue comes down to dependency resolution. After remotes are registered, when a dependency is imported, the MF runtime (or just at build time) attempts to pull in all of the remote manifests so it can best decide which version to load from where according to all of the different remote's defined shared rules. This is a good thing and happens this way by design for good reasons.

The problem is that when one of those manifests fails to load, there is no way for the module federation logic handling the dependency/shared resolution/optimization to essentially give up and ignore a bad remote that cannot be loaded. It will throw an error, but we have no way to act upon that error to allow the app continue to build/render gracefully without that remote - we've already been ejected from the import call that triggered it. At runtime, I'd rather be able to not load that remote and let any reference to that remote fail as they are encountered instead of all imports globally getting blocked. I can understand why you might want to outright fail at build time still by default - but it almost should never be the ideal behavior on the client.

I think perhaps there needs to be a way to unregister a remote that fails to load and signal to the runtime that it should give up on including the problem remote whose metadata cannot be fetched in its attempts to evaluate all shared metadata for dependency resolution. Something to signal it should just skip considering that remote at all and keep going.

@danpeen or maybe @ScriptedAlchemy, thoughts on how we could handle this? It feels like there might need to be an API change here that touches what I assume might be some core dependency/shared resolution logic and I get that that might be a larger effort or take some time to even get around to. Is it something that would make sense to consider though? We might be interested in helping if there were some direction on how y'all would prefer to go about it since I'm certainly no expert here atm.

I understand why this hasn't been something considered given the dynamic runtime option is relatively new, but without a way to deal with this it's really hard to be able to utilize dynamic/runtime federation and build a reliable system because as is - when anything is down, then everything is down.

Potential mitigation API?

There are a lot of different ways I imagine this could be handled, but perhaps something along these lines could make sense:

errorLoadRemote(args) {
  const { lifecycle, skipRemote } = args;
  if (lifecycle === 'onLoad') {
    skipRemote();
  } else if (lifecycle === 'beforeRequest') {
    return args;
  }
}

Mar 27 '25 20:03 hrmcdonald

when errorLoadRemote fails - you should be able to return a function / component as a fallback. Would this example work? https://github.com/module-federation/module-federation-examples/tree/master/runtime-plugins/offline-remote It uses the js entry but in theory it should be able to return a function back as the fallback loadRemote component you wanted - however the actual mechanics of loadingManifest may get in the way where the runtime still expects a json response - thus while it already supports function return - it might be that we need to say something like if typeof function => skip trying to read manifest data and instead go right to returning the function as the "loaded remote"

Mar 28 '25 07:03 ScriptedAlchemy

when errorLoadRemote fails - you should be able to return a function / component as a fallback. Would this example work? https://github.com/module-federation/module-federation-examples/tree/master/runtime-plugins/offline-remote It uses the js entry but in theory it should be able to return a function back as the fallback loadRemote component you wanted - however the actual mechanics of loadingManifest may get in the way where the runtime still expects a json response - thus while it already supports function return - it might be that we need to say something like if typeof function => skip trying to read manifest data and instead go right to returning the function as the "loaded remote"

Yeah @ScriptedAlchemy, we have basically that exact same plugin setup in our project at the moment. The issue occurs when errorLoadRemote is called trying to fetch a manifest.json file and not an actual JS module. It's before importing of the requested dependency even takes place. I believe it's getting called while the runtime is fetching all of the registered remote manifests/remoteEntries so it can try to reconcile how to most optimally fetch shared deps. As a result, all new shared dependency import attempts, that occur after a manifest that is offline is registered, will fail trying to fetch the manifest again instead of returning any module response. (have not tested pointing to a remoteEntry.js directly here)

It's not even realistically possible to mock a manifest file response either because there is no way to know what name values and metadata values should be provided in the mock response. I've tried this and it ends up just triggering more and more errors downstream as MF is trying to parse the manifest file snapshots because of this.

So because fetching the manifest fails, basically all shared dependency resolution fails from that point forward. So yes, if a function is returned from errorLoadRemote while a manifest is being fetched, instead of throwing an error, ideally the runtime just leaves that manifest out of dependency resolution altogether without throwing a breaking error/assertion. This would let any imports to modules provided by a downed remote just fail as if the remote wasn't registered to begin with so they can be handled in isolation instead of bringing down all shared imports.

Perhaps there should just be a different hook for errorLoadManifest or something like that so those can be handled more explicitly? Functionally, as long as we can skip an offline manifest though, this issue would go away.

Apr 01 '25 18:04 hrmcdonald

This is a must for us, the entire page is breaking if we have a mf-manifest.json remote fail (its unlikely but needs to be covered). errorLoadManifest would be a great solution. Im struggling to understand the codebase, but does any contributors have time to look at this? If not i totally understand! Would be very appreciated! <3 @ScriptedAlchemy @danpeen

Apr 08 '25 22:04 cain

@hrmcdonald @cain Sorry for the late response. I will take a look tomorrow. I think this is not as expected. When fetch manifest.json fails, we expect to trigger the errorLoadRemomte hook and support returning static json data. I will use the demo provided by #3673 as a reproduce demo to troubleshoot this problem.

Apr 09 '25 13:04 danpeen

The difficulty as it exists currently is that when loading the manifest fails, there's no good way to know which file failed, and because it expects you to return a valid JSON manifest, there's no straightforward way to provide fallback content like when catching remoteEntry.js failures. It was mentioned above that you might try caching manifests to local storage and falling back to that, but this assumes that the remote can be loaded in the first place and doesn't help in a situation where the user has never successfully loaded the manifest. The app just crashes.

My ideal solution is some easy way to specify a universal fallback in the event that a manifest fails to load, which would allow me to show a generic error component instead.

Apr 09 '25 18:04 nickhall

@hrmcdonald I think you can set errorLoadRemote like this, because when fetch manifest json failed, we will trigger errorLoadRemote hook and the lifecycle is exactly 'afterResolve', So you can set your fallback static manifest.json data. And it will works. I have mentioned it in this blog: https://module-federation.io/blog/error-load-remote.html#plugin-registration--synchronous-import

errorLoadRemote(args) {
    if (args.lifecycle === 'afterResolve') {
      console.log('***** fallback to fallbackData ****', fallbackData);
      // Use predefined backup manifest static data
      const backupManifest = {
        "id": "federation_provider",
        "name": "federation_provider",
        "mateData": "xxx",
       };
       return backupManifest;
    }
}

https://github.com/user-attachments/assets/1691df6f-7979-4fbe-bf84-2ce530dfa9ac

and also we need to upgrade to the latest version.

cc @cain @nickhall @ScriptedAlchemy

Apr 10 '25 04:04 danpeen

@danpeen thanks for the reply!

The backupManifest object does work like you showed. I think thats the best solution for now.

Im just using a "fake" manifest.

        const backupManifest = {
          id: 'fallback',
          name: 'fallback',
          metaData: {
            name: 'fallback',
            type: 'app',
            buildInfo: {
              buildVersion: 'local',
              buildName: 'fallback',
            },
            remoteEntry: {
              name: 'remoteEntry.js',
              path: '',
              type: 'global',
            },
            types: {
              path: '',
              name: '',
              zip: '@mf-types.zip',
              api: '@mf-types.d.ts',
            },
            globalName: 'fallback',
            pluginVersion: '1',
            prefetchInterface: false,
            publicPath: 'https://example.com/',
          },
          shared: [],
          remotes: [],
          exposes: [],
        };

I do think that a errorLoadManifest hook would be a better solution in the long term!

Apr 11 '25 04:04 cain

@cain Haha, Because there are many factors that may influence remote module rendering, so we handle it in the unified errorLoadRemote hook. In this hook we will provide detailed module info through args parameters to users to help users to control fallback error logic.

So at least for now, we think maybe errorLoadRemote is ok for handling different remote errors. If in the future, we really need some other error handling hook to address some other problem that is not appropriate handle in the errorLoadRemote, I think we will provide these kind of hooks like errorLoadManifest to handle it.

Apr 11 '25 07:04 danpeen

Thanks @danpeen! With the updates and the latest discussion here we are now able to successfully handle this issue.

I'll note the snippet in your comment here doesn't work as is because it does not mock out the manifest fully it seems. But using the full mock @cain provided here did the trick!

This is a big help and improvement so thanks for sticking with it @danpeen. I think we are OK to close this issue now if everyone is has been able to handle this with the latest updates as well.

I noticed the blog post on error handling now includes this condition as well which is great. You might want to include the mock Cain provided above though for situations where fetching a remote backup is not possible though since mocking the full manifest isn't an obvious process.

Apr 11 '25 16:04 hrmcdonald

@hrmcdonald OK, Cool! With no problem, I will update my mock data with data Cain provided. Thank you everyone!~

Apr 13 '25 13:04 danpeen

OK great, I think we're good to close this issue now then. If anyone else runs into something along these lines feel free to comment and re-open from there.

Apr 14 '25 20:04 hrmcdonald

@hrmcdonald I think you can set errorLoadRemote like this, because when fetch manifest json failed, we will trigger errorLoadRemote hook and the lifecycle is exactly 'afterResolve', So you can set your fallback static manifest.json data. And it will works. I have mentioned it in this blog: https://module-federation.io/blog/error-load-remote.html#plugin-registration--synchronous-import

errorLoadRemote(args) { if (args.lifecycle === 'afterResolve') { console.log('***** fallback to fallbackData ****', fallbackData); // Use predefined backup manifest static data const backupManifest = { "id": "federation_provider", "name": "federation_provider", "mateData": "xxx", }; return backupManifest; } } video.mp4 and also we need to upgrade to the latest version.

cc @cain @nickhall @ScriptedAlchemy

@danpeen do you mean in rspack ecosystem we need to upgrade the provided version?

Apr 17 '25 02:04 ScriptedAlchemy