apollo-client icon indicating copy to clipboard operation
apollo-client copied to clipboard

resetStore and refetchQueries can cause to refetch unmounted/inactive queries in StrictMode

Open Cellule opened this issue 1 year ago • 18 comments

Issue Description

We recently upgraded to version 3 Right now we use resetStore for a particular scenario. All graphql requests sent have an "organizationId" header associated. Depending on that "organizationId" the user will see different content. When the user changes to another organization we navigate to an almost empty page for the transition (to unmount all organization specific queries) Then we call resetStore to refetch top-level queries that "might" be affected by the change of context. Once everything resolves, we navigate back to the normal app.

This used to work fine for the last 5 years, then we updated to apollo client version 3.

After some digging we found out that after calling resetStore some queries that are unmounted (confirmed 100%) end up being refetched. I debugged through the resetStore method, but I found no calls to refetch those queries. So right now I have no idea what could cause those unmounted queries to be requested once again.

The biggest problem I have right now is that because those queries are called from the wrong context, it causes another mechanism to automatically redirect to the previous organization (so direct links work in any organization) I have a lot of fail safe in place and it's fine 90% of the time, but when the network is slower those fail safe are not enough and it leads to the user no being able to change organization.

Any guidance on what could cause an unmounted query to hit the network once more after calling resetStore would be appreciated Or tips on how to find out

Link to Reproduction

https://codesandbox.io/p/devbox/jolly-wilbur-n7m7jz?layout=%257B%2522sidebarPanel%2522%253A%2522EXPLORER%2522%252C%2522rootPanelGroup%2522%253A%257B%2522direction%2522%253A%2522horizontal%2522%252C%2522contentType%2522%253A%2522UNKNOWN%2522%252C%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522id%2522%253A%2522ROOT_LAYOUT%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522UNKNOWN%2522%252C%2522direction%2522%253A%2522vertical%2522%252C%2522id%2522%253A%2522clwtppgmj00073p6i6i7ie1cc%2522%252C%2522sizes%2522%253A%255B70%252C30%255D%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522EDITOR%2522%252C%2522direction%2522%253A%2522horizontal%2522%252C%2522id%2522%253A%2522EDITOR%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL%2522%252C%2522contentType%2522%253A%2522EDITOR%2522%252C%2522id%2522%253A%2522clwtppgmj00023p6iumaf2w8i%2522%257D%255D%257D%252C%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522SHELLS%2522%252C%2522direction%2522%253A%2522horizontal%2522%252C%2522id%2522%253A%2522SHELLS%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL%2522%252C%2522contentType%2522%253A%2522SHELLS%2522%252C%2522id%2522%253A%2522clwtppgmj00043p6igepxit5p%2522%257D%255D%252C%2522sizes%2522%253A%255B100%255D%257D%255D%257D%252C%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522DEVTOOLS%2522%252C%2522direction%2522%253A%2522vertical%2522%252C%2522id%2522%253A%2522DEVTOOLS%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL%2522%252C%2522contentType%2522%253A%2522DEVTOOLS%2522%252C%2522id%2522%253A%2522clwtppgmj00063p6ixzewo28y%2522%257D%255D%252C%2522sizes%2522%253A%255B100%255D%257D%255D%252C%2522sizes%2522%253A%255B69.49598743432026%252C30.504012565679744%255D%257D%252C%2522tabbedPanels%2522%253A%257B%2522clwtppgmj00023p6iumaf2w8i%2522%253A%257B%2522id%2522%253A%2522clwtppgmj00023p6iumaf2w8i%2522%252C%2522activeTabId%2522%253A%2522clwtpsi76003n3p6i7vuk56vl%2522%252C%2522tabs%2522%253A%255B%257B%2522id%2522%253A%2522clwtppgmj00013p6ipicylgfm%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522FILE%2522%252C%2522filepath%2522%253A%2522%252FREADME.md%2522%252C%2522state%2522%253A%2522IDLE%2522%257D%252C%257B%2522id%2522%253A%2522clwtpsi76003n3p6i7vuk56vl%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522FILE%2522%252C%2522initialSelections%2522%253A%255B%257B%2522startLineNumber%2522%253A25%252C%2522startColumn%2522%253A27%252C%2522endLineNumber%2522%253A25%252C%2522endColumn%2522%253A27%257D%255D%252C%2522filepath%2522%253A%2522%252Fsrc%252Flink.js%2522%252C%2522state%2522%253A%2522IDLE%2522%257D%252C%257B%2522type%2522%253A%2522FILE%2522%252C%2522filepath%2522%253A%2522%252F.codesandbox%252Ftasks.json%2522%252C%2522id%2522%253A%2522clwumzut3002e3p6gxrqqie21%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522state%2522%253A%2522IDLE%2522%257D%255D%257D%252C%2522clwtppgmj00063p6ixzewo28y%2522%253A%257B%2522id%2522%253A%2522clwtppgmj00063p6ixzewo28y%2522%252C%2522activeTabId%2522%253A%2522clxyzoh8p000l3j6jh401n3i3%2522%252C%2522tabs%2522%253A%255B%257B%2522id%2522%253A%2522clwtppgmj00053p6iqkbdkq6f%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522TASK_PORT%2522%252C%2522taskId%2522%253A%2522start%2522%252C%2522port%2522%253A3000%252C%2522path%2522%253A%2522%252F%2522%257D%252C%257B%2522type%2522%253A%2522SETUP_TASKS%2522%252C%2522id%2522%253A%2522clxyzmn6r000o3j6jbsttjmyy%2522%252C%2522mode%2522%253A%2522permanent%2522%257D%252C%257B%2522type%2522%253A%2522UNASSIGNED_PORT%2522%252C%2522port%2522%253A2222%252C%2522id%2522%253A%2522clxyzn4b5001p3j6jw7ckv2w6%2522%252C%2522mode%2522%253A%2522permanent%2522%257D%252C%257B%2522type%2522%253A%2522TASK_PORT%2522%252C%2522port%2522%253A3000%252C%2522taskId%2522%253A%2522start%2522%252C%2522id%2522%253A%2522clxyzoh8p000l3j6jh401n3i3%2522%252C%2522mode%2522%253A%2522permanent%2522%257D%255D%257D%252C%2522clwtppgmj00043p6igepxit5p%2522%253A%257B%2522id%2522%253A%2522clwtppgmj00043p6igepxit5p%2522%252C%2522activeTabId%2522%253A%2522clwtppgmj00033p6iw9cgtvcz%2522%252C%2522tabs%2522%253A%255B%257B%2522id%2522%253A%2522clwtppgmj00033p6iw9cgtvcz%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522TASK_LOG%2522%252C%2522taskId%2522%253A%2522start%2522%257D%252C%257B%2522id%2522%253A%2522clwtqf2io00a13p6m5m1c3s1n%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522TERMINAL%2522%252C%2522shellId%2522%253A%2522clwtqf2lx0074dke2ga51ahaj%2522%257D%255D%257D%257D%252C%2522showDevtools%2522%253Atrue%252C%2522showShells%2522%253Atrue%252C%2522showSidebar%2522%253Atrue%252C%2522sidebarPanelSize%2522%253A15%257D

Reproduction Steps

  • Go to "Home"
  • Go to "Home2"
  • Click "Reset Store" button
  • You'll notice the query "AllPeople" (only mounted in Home not Home2) will be sent when it should not

@apollo/client version

3.10.4

Cellule avatar Jun 28 '24 18:06 Cellule

Here are my findings so far (no repro still -_-)

I have a query on some page and it is set as "dirty" at some point https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryInfo.ts#L229-L234

Then the query is unmounted, but remains "dirty"

Later we call resetStore which calls reFetchObservableQueries https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/ApolloClient.ts#L594-L603

In reFetchObservableQueries we call broadcastQueries https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryManager.ts#L969

In broadcastQueries we call notify for ALL queries even inactive ones https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryManager.ts#L1072-L1075

In notify for my problematic query, because it is "dirty" we call the listener which ends up calling reobserverCacheFirst on that query causing to send it to the server https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryInfo.ts#L251-L272

So it seems like at some point we should ignore "inactive" queries, aka observableQuery.hasObservers() is false

I think the right place to do this check would be in QueryInfo.shouldNotify https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryInfo.ts#L289-L292

Otherwise, maybe we should NOT call broadcastQueries in the first place from within reFecthObservableQueries ?

Cellule avatar Jun 28 '24 19:06 Cellule

Alright after going through this whole effort, I managed to make my minimal repro work!!

Cellule avatar Jun 28 '24 19:06 Cellule

@Cellule thanks for the bug report!

The reproduction is super helpful! I've cloned your reproduction and started looking into this to understand myself. Its close to the weekend and I'm not sure I'll be able to finish until early next week, but thanks so much for your thorough analysis. This will definitely save me a ton of time. I'll get back to you as soon as I can!

jerelmiller avatar Jun 28 '24 22:06 jerelmiller

@Cellule looking at this further, your diagnosis seems correct. I had originally assumed there was a bug holding onto the ObservableQuery in the this.queries map inside QueryManager, but that doesn't appear to be the case (hence my assumption in https://github.com/apollographql/apollo-client/issues/11894#issuecomment-2197707409).

I can now see that the only reason you're getting the refetch is because the ObservableQuery instance from the first page hasn't yet been garbage collected, so the notify call that you pointed out in your deep dive is triggering the refetch.

I've updated your reproduction to show that once ObservableQuery is garbage collected, we're able to see only a single network request (try clicking the "reset store" button after you see the log message about it getting garbage collected): https://codesandbox.io/p/devbox/fervent-scooby-c5xwk3. This at least confirms that we aren't holding onto that ObservableQuery instance somewhere else in the client 🎉.

I've discussed this with the team to make sure adding a check for hasObservers wouldn't have any other adverse effects, and we couldn't think of any. I think your suggested fix makes sense. I'd like to dig into the proper place for this, but otherwise I think we should move forward with this fix. If you're interested in submitting a PR, I'd be happy to review, otherwise I'll try and get something up within the next couple days.

Thanks again for your thorough deep dive through the code and the time getting a reproduction! This made my life so much easier ❤️.

jerelmiller avatar Jul 01 '24 19:07 jerelmiller

@jerelmiller Great to hear! I was about to open a PR then figured I should write a test for this. However, I can't figure out how to reach the state where there's an inactive ObservableQuery in the QueryManager. Actually, the more I review the code, the less I understand how it's possible to have an inactive query in the query manager 😅

Cellule avatar Jul 02 '24 16:07 Cellule

Haha right?!?! Thats why I was so confused why you were able to reproduce this because technically it shouldn't be possible 😆.

From what I understand, its the fact that the ObservableQuery instance still exists in memory and hasn't been garbage collected yet, so its able to respond to updates when broadcastQueries is run. The best way I could think to try and approach a test would be:

If you want to test this at the React level with useQuery:

  • Render a component with one query using useQuery.
  • After the fetch is complete, unmount that component and mount another component with another useQuery (a ternary would probably work just fine here). Make sure these are different queries so that query deduplication doesn't kick in giving you a false positive.
  • Execute client.reFetchObservableQueries
  • assert that only one of the two queries is executed.

If you opt for this approach, we have a handy Profiler utility that makes it easier to assert render-by-render. Here is an example of its usage with useQuery: https://github.com/apollographql/apollo-client/blob/1a9c68a4e6bc033498ce620681ae4a3720820ba4/src/react/hooks/tests/useQuery.test.tsx#L4177-L4432

If you want to test this with core APIs:

  • Create an observable with client.watchQuery
  • Subscribe to it, let the fetch finish, then unsubscribe from it
  • Create a 2nd observable with client.watchQuery. Ensure its a different query so that query deduplication doesn't kick in giving you a false positive.
  • Subscribe to it, let the fetch finish
  • Execute client.reFetchObservableQueries
  • Assert that only one of the two queries is executed.

If you opt for this approach, we have a handy ObservableStream utility that captures each emitted result that you can assert on in a more synchronous way. Here is an example of its usage: https://github.com/apollographql/apollo-client/blob/1a9c68a4e6bc033498ce620681ae4a3720820ba4/src/core/tests/ObservableQuery.ts#L2393-L2509

If you're struggling with either of these, let me know and I'd be happy to take a stab at creating a failing test for this to use as a base for implementing the change.

jerelmiller avatar Jul 02 '24 17:07 jerelmiller

So I'm not 100% sure why, but my current investigation leads me to this bit of code where InternalState.useObservableQuery will call this.client.watchQuery twice for the same query (can be seen on page load of my repro). The first one created seems to be abandoned, but remains in the list of query of the QueryManager.

https://github.com/apollographql/apollo-client/blob/116723d88b5600126cb266060ca58ce989d3988e/src/react/hooks/useQuery.ts#L531-L569

So in the end, it seems it's not an issue of query being cleaned up left in the QueryManager. It's a separate query that was never properly registered that is somehow in the list. When resetStore is called, that "never alive" query is revived. So maybe a proper fix would be to avoid this duplicate in the first place since I suspect there might be lots of "uninitialized" queries

Cellule avatar Jul 02 '24 18:07 Cellule

That sounds very familiar to https://github.com/apollographql/apollo-client/pull/11837 where I noticed something similar. Are you using React's strict mode by chance? I believe the the issue here was that the strict mode caused useState to run twice, which meant useInternalState was creating 2 instances of InternalState, which ran watchQuery twice: https://github.com/apollographql/apollo-client/blob/116723d88b5600126cb266060ca58ce989d3988e/src/react/hooks/useQuery.ts#L125

Take a look at the PR description in https://github.com/apollographql/apollo-client/pull/11837. Perhaps you might be seeing something similar. Funny enough, the fix for that one was also doing a check for hasObservers 😆

jerelmiller avatar Jul 02 '24 19:07 jerelmiller

It does look like StrictMode is the offender in the repro above where useRef in useInternalState is empty on both renders causing to create an ObservableQuery twice It would point to bad cleanup in useInternalState.

However, I'm hitting this issue in production where StrictMode should do nothing. Let me see if I can repro without StrictMode

Cellule avatar Jul 04 '24 11:07 Cellule

My biggest fear is that there's a memory leak somehow. I haven't been able to confirm it either in a repro nor in our production application, so I suppose that for now the initial patch should do the trick. I'll try to monitor afterward in case memory leak occurs

As for testing, I managed to write a test that fails without the fix and passes with it, it's not the cleanest approach, but let's discuss this on the PR directly

Cellule avatar Jul 04 '24 13:07 Cellule

Thanks for the PR! I'll take a look when I'm back on Monday. Have a great weekend!

jerelmiller avatar Jul 05 '24 21:07 jerelmiller

@jerelmiller Someone on my team, @lorenzopicoli , found another related issue. In one of our mutations, we're using the refetchQueries option

const [mutation] = useMutation<ICreateAssetMutationResp, ICreateAssetMutationArgs>(
    createAssetMutation,
    {
      ignoreResults: true,
      refetchQueries: [
        // needed because a sub-asset can be created at any level in the sub-assets page
        // called if in /assets/:assetId/subs, to refresh the descendants count in treeView
        "AssetParentTreeView",
        // called if in /assets/:assetId, to refresh the descendants count in sub-asset tree-view
        "AssetDetailsQuery",
      ],
    }
)

I went down through the code and ended up in QueryManager.getObservableQueries https://github.com/Cellule/apollo-client/blob/823b26e1073ae9c50d7c70078ea99e758975dab6/src/core/QueryManager.ts#L852-L900 It seems if include is an array of strings, it'll return queries matching the name even if they're not active. I feel this is a mistake, but I can't 100% confirm. I'll add a proposed fix in my PR for you to take a look

Cellule avatar Oct 11 '24 16:10 Cellule

I dug a bit more into it. From what I can see here's what happen (exacerbated by StrictMode, but possible without)

  • useQuery is called
  • client.watchQuery is called creating an observableQuery
  • React.useEffect() (actually useSyncExternalStore) is called to begin subscribing to the subscription
  • Something causes the component to unmount immediately (like StrictMode). The useEffect to start the subscription is never called Apparently this is possible when mounting/unmounting component relatively fast
  • Another query mount and goes through the process normally
  • InMemoryCache calls broadcastWatch after getting results from the "second" query, this causes the first query to become dirty
  • And bad state reached

Cellule avatar Oct 11 '24 17:10 Cellule

So clearly StrictMode is being a huge offender of this problem. I can see that the useState initializer function runs twice in StrictMode https://github.com/apollographql/apollo-client/blob/80471816396fa023fa2dc3b768ceb3da5b22edf4/src/react/hooks/useQuery.ts#L179-L205

https://react.dev/reference/react/useState#caveats Since there's no proper "cleanup" for the observable that is created in useState, we always end up with 2 observable for every query. According to this https://github.com/facebook/react/issues/20090#issuecomment-715926549 Creating a watcher/listener in useState is an anti-pattern and that's why it causes such problems in StrictMode

I'll try to come up with a better repro without StrictMode, but I do believe we should figure out and fix that issue in StrictMode as well to respect the rules of React

Cellule avatar Oct 15 '24 15:10 Cellule

@Cellule thanks for that confirmation. I suspected strict mode was the cause of at least one of the cases and agree we should try and do a better job here to avoid he double initialization. Do let me know if you're able to reproduce it without strict mode though. I'd be curious how that is allowed to happen, but my guess is the underlying cause is still very similar (e.g. multiple ObservableQuery instances are created)

jerelmiller avatar Oct 15 '24 16:10 jerelmiller

Now I'm starting to wonder if the main issue was fixed (aka dead queries in production) When I initially encountered this issue we were on apollo-client 3.10.4, we are currently on 3.10.8 and I can't seem to repro I reviewed the diff between those 2 versions and it does look like something changed around how useQuery handles its state v3.10.4...v3.10.8 Or more specifically from this PR https://github.com/apollographql/apollo-client/pull/11851

You probably have more context around that particular change, do you think this solved it ? If yes, then maybe we just have to focus on fixing the useState(createInternalState) in StrictMode

Cellule avatar Oct 17 '24 13:10 Cellule

@Cellule ok thats good news! That change was made to provide better compatibility with the new React Compiler. We were violating a few rules of React that would have made it difficult to integrate with it. Very possible the move away from the ref and into state helps here.

jerelmiller avatar Oct 17 '24 15:10 jerelmiller

Great, I updated the title of the issue to better represent the issue.

Cellule avatar Oct 17 '24 20:10 Cellule