apollo-client
apollo-client copied to clipboard
resetStore and refetchQueries can cause to refetch unmounted/inactive queries in StrictMode
Issue Description
We recently upgraded to version 3
Right now we use resetStore for a particular scenario.
All graphql requests sent have an "organizationId" header associated.
Depending on that "organizationId" the user will see different content.
When the user changes to another organization we navigate to an almost empty page for the transition (to unmount all organization specific queries)
Then we call resetStore to refetch top-level queries that "might" be affected by the change of context.
Once everything resolves, we navigate back to the normal app.
This used to work fine for the last 5 years, then we updated to apollo client version 3.
After some digging we found out that after calling resetStore some queries that are unmounted (confirmed 100%) end up being refetched.
I debugged through the resetStore method, but I found no calls to refetch those queries.
So right now I have no idea what could cause those unmounted queries to be requested once again.
The biggest problem I have right now is that because those queries are called from the wrong context, it causes another mechanism to automatically redirect to the previous organization (so direct links work in any organization) I have a lot of fail safe in place and it's fine 90% of the time, but when the network is slower those fail safe are not enough and it leads to the user no being able to change organization.
Any guidance on what could cause an unmounted query to hit the network once more after calling resetStore would be appreciated Or tips on how to find out
Link to Reproduction
https://codesandbox.io/p/devbox/jolly-wilbur-n7m7jz?layout=%257B%2522sidebarPanel%2522%253A%2522EXPLORER%2522%252C%2522rootPanelGroup%2522%253A%257B%2522direction%2522%253A%2522horizontal%2522%252C%2522contentType%2522%253A%2522UNKNOWN%2522%252C%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522id%2522%253A%2522ROOT_LAYOUT%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522UNKNOWN%2522%252C%2522direction%2522%253A%2522vertical%2522%252C%2522id%2522%253A%2522clwtppgmj00073p6i6i7ie1cc%2522%252C%2522sizes%2522%253A%255B70%252C30%255D%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522EDITOR%2522%252C%2522direction%2522%253A%2522horizontal%2522%252C%2522id%2522%253A%2522EDITOR%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL%2522%252C%2522contentType%2522%253A%2522EDITOR%2522%252C%2522id%2522%253A%2522clwtppgmj00023p6iumaf2w8i%2522%257D%255D%257D%252C%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522SHELLS%2522%252C%2522direction%2522%253A%2522horizontal%2522%252C%2522id%2522%253A%2522SHELLS%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL%2522%252C%2522contentType%2522%253A%2522SHELLS%2522%252C%2522id%2522%253A%2522clwtppgmj00043p6igepxit5p%2522%257D%255D%252C%2522sizes%2522%253A%255B100%255D%257D%255D%257D%252C%257B%2522type%2522%253A%2522PANEL_GROUP%2522%252C%2522contentType%2522%253A%2522DEVTOOLS%2522%252C%2522direction%2522%253A%2522vertical%2522%252C%2522id%2522%253A%2522DEVTOOLS%2522%252C%2522panels%2522%253A%255B%257B%2522type%2522%253A%2522PANEL%2522%252C%2522contentType%2522%253A%2522DEVTOOLS%2522%252C%2522id%2522%253A%2522clwtppgmj00063p6ixzewo28y%2522%257D%255D%252C%2522sizes%2522%253A%255B100%255D%257D%255D%252C%2522sizes%2522%253A%255B69.49598743432026%252C30.504012565679744%255D%257D%252C%2522tabbedPanels%2522%253A%257B%2522clwtppgmj00023p6iumaf2w8i%2522%253A%257B%2522id%2522%253A%2522clwtppgmj00023p6iumaf2w8i%2522%252C%2522activeTabId%2522%253A%2522clwtpsi76003n3p6i7vuk56vl%2522%252C%2522tabs%2522%253A%255B%257B%2522id%2522%253A%2522clwtppgmj00013p6ipicylgfm%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522FILE%2522%252C%2522filepath%2522%253A%2522%252FREADME.md%2522%252C%2522state%2522%253A%2522IDLE%2522%257D%252C%257B%2522id%2522%253A%2522clwtpsi76003n3p6i7vuk56vl%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522FILE%2522%252C%2522initialSelections%2522%253A%255B%257B%2522startLineNumber%2522%253A25%252C%2522startColumn%2522%253A27%252C%2522endLineNumber%2522%253A25%252C%2522endColumn%2522%253A27%257D%255D%252C%2522filepath%2522%253A%2522%252Fsrc%252Flink.js%2522%252C%2522state%2522%253A%2522IDLE%2522%257D%252C%257B%2522type%2522%253A%2522FILE%2522%252C%2522filepath%2522%253A%2522%252F.codesandbox%252Ftasks.json%2522%252C%2522id%2522%253A%2522clwumzut3002e3p6gxrqqie21%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522state%2522%253A%2522IDLE%2522%257D%255D%257D%252C%2522clwtppgmj00063p6ixzewo28y%2522%253A%257B%2522id%2522%253A%2522clwtppgmj00063p6ixzewo28y%2522%252C%2522activeTabId%2522%253A%2522clxyzoh8p000l3j6jh401n3i3%2522%252C%2522tabs%2522%253A%255B%257B%2522id%2522%253A%2522clwtppgmj00053p6iqkbdkq6f%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522TASK_PORT%2522%252C%2522taskId%2522%253A%2522start%2522%252C%2522port%2522%253A3000%252C%2522path%2522%253A%2522%252F%2522%257D%252C%257B%2522type%2522%253A%2522SETUP_TASKS%2522%252C%2522id%2522%253A%2522clxyzmn6r000o3j6jbsttjmyy%2522%252C%2522mode%2522%253A%2522permanent%2522%257D%252C%257B%2522type%2522%253A%2522UNASSIGNED_PORT%2522%252C%2522port%2522%253A2222%252C%2522id%2522%253A%2522clxyzn4b5001p3j6jw7ckv2w6%2522%252C%2522mode%2522%253A%2522permanent%2522%257D%252C%257B%2522type%2522%253A%2522TASK_PORT%2522%252C%2522port%2522%253A3000%252C%2522taskId%2522%253A%2522start%2522%252C%2522id%2522%253A%2522clxyzoh8p000l3j6jh401n3i3%2522%252C%2522mode%2522%253A%2522permanent%2522%257D%255D%257D%252C%2522clwtppgmj00043p6igepxit5p%2522%253A%257B%2522id%2522%253A%2522clwtppgmj00043p6igepxit5p%2522%252C%2522activeTabId%2522%253A%2522clwtppgmj00033p6iw9cgtvcz%2522%252C%2522tabs%2522%253A%255B%257B%2522id%2522%253A%2522clwtppgmj00033p6iw9cgtvcz%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522TASK_LOG%2522%252C%2522taskId%2522%253A%2522start%2522%257D%252C%257B%2522id%2522%253A%2522clwtqf2io00a13p6m5m1c3s1n%2522%252C%2522mode%2522%253A%2522permanent%2522%252C%2522type%2522%253A%2522TERMINAL%2522%252C%2522shellId%2522%253A%2522clwtqf2lx0074dke2ga51ahaj%2522%257D%255D%257D%257D%252C%2522showDevtools%2522%253Atrue%252C%2522showShells%2522%253Atrue%252C%2522showSidebar%2522%253Atrue%252C%2522sidebarPanelSize%2522%253A15%257D
Reproduction Steps
- Go to "Home"
- Go to "Home2"
- Click "Reset Store" button
- You'll notice the query "AllPeople" (only mounted in Home not Home2) will be sent when it should not
@apollo/client version
3.10.4
Here are my findings so far (no repro still -_-)
I have a query on some page and it is set as "dirty" at some point https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryInfo.ts#L229-L234
Then the query is unmounted, but remains "dirty"
Later we call resetStore which calls reFetchObservableQueries
https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/ApolloClient.ts#L594-L603
In reFetchObservableQueries we call broadcastQueries
https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryManager.ts#L969
In broadcastQueries we call notify for ALL queries even inactive ones
https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryManager.ts#L1072-L1075
In notify for my problematic query, because it is "dirty" we call the listener which ends up calling reobserverCacheFirst on that query causing to send it to the server
https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryInfo.ts#L251-L272
So it seems like at some point we should ignore "inactive" queries, aka observableQuery.hasObservers() is false
I think the right place to do this check would be in QueryInfo.shouldNotify
https://github.com/apollographql/apollo-client/blob/d914d689abd9a3364cfa9c6bef8617d97df67a1c/src/core/QueryInfo.ts#L289-L292
Otherwise, maybe we should NOT call broadcastQueries in the first place from within reFecthObservableQueries ?
Alright after going through this whole effort, I managed to make my minimal repro work!!
@Cellule thanks for the bug report!
The reproduction is super helpful! I've cloned your reproduction and started looking into this to understand myself. Its close to the weekend and I'm not sure I'll be able to finish until early next week, but thanks so much for your thorough analysis. This will definitely save me a ton of time. I'll get back to you as soon as I can!
@Cellule looking at this further, your diagnosis seems correct. I had originally assumed there was a bug holding onto the ObservableQuery in the this.queries map inside QueryManager, but that doesn't appear to be the case (hence my assumption in https://github.com/apollographql/apollo-client/issues/11894#issuecomment-2197707409).
I can now see that the only reason you're getting the refetch is because the ObservableQuery instance from the first page hasn't yet been garbage collected, so the notify call that you pointed out in your deep dive is triggering the refetch.
I've updated your reproduction to show that once ObservableQuery is garbage collected, we're able to see only a single network request (try clicking the "reset store" button after you see the log message about it getting garbage collected): https://codesandbox.io/p/devbox/fervent-scooby-c5xwk3. This at least confirms that we aren't holding onto that ObservableQuery instance somewhere else in the client 🎉.
I've discussed this with the team to make sure adding a check for hasObservers wouldn't have any other adverse effects, and we couldn't think of any. I think your suggested fix makes sense. I'd like to dig into the proper place for this, but otherwise I think we should move forward with this fix. If you're interested in submitting a PR, I'd be happy to review, otherwise I'll try and get something up within the next couple days.
Thanks again for your thorough deep dive through the code and the time getting a reproduction! This made my life so much easier ❤️.
@jerelmiller Great to hear! I was about to open a PR then figured I should write a test for this. However, I can't figure out how to reach the state where there's an inactive ObservableQuery in the QueryManager. Actually, the more I review the code, the less I understand how it's possible to have an inactive query in the query manager 😅
Haha right?!?! Thats why I was so confused why you were able to reproduce this because technically it shouldn't be possible 😆.
From what I understand, its the fact that the ObservableQuery instance still exists in memory and hasn't been garbage collected yet, so its able to respond to updates when broadcastQueries is run. The best way I could think to try and approach a test would be:
If you want to test this at the React level with useQuery:
- Render a component with one query using
useQuery. - After the fetch is complete, unmount that component and mount another component with another
useQuery(a ternary would probably work just fine here). Make sure these are different queries so that query deduplication doesn't kick in giving you a false positive. - Execute
client.reFetchObservableQueries - assert that only one of the two queries is executed.
If you opt for this approach, we have a handy Profiler utility that makes it easier to assert render-by-render. Here is an example of its usage with useQuery: https://github.com/apollographql/apollo-client/blob/1a9c68a4e6bc033498ce620681ae4a3720820ba4/src/react/hooks/tests/useQuery.test.tsx#L4177-L4432
If you want to test this with core APIs:
- Create an observable with
client.watchQuery - Subscribe to it, let the fetch finish, then unsubscribe from it
- Create a 2nd observable with
client.watchQuery. Ensure its a different query so that query deduplication doesn't kick in giving you a false positive. - Subscribe to it, let the fetch finish
- Execute
client.reFetchObservableQueries - Assert that only one of the two queries is executed.
If you opt for this approach, we have a handy ObservableStream utility that captures each emitted result that you can assert on in a more synchronous way. Here is an example of its usage: https://github.com/apollographql/apollo-client/blob/1a9c68a4e6bc033498ce620681ae4a3720820ba4/src/core/tests/ObservableQuery.ts#L2393-L2509
If you're struggling with either of these, let me know and I'd be happy to take a stab at creating a failing test for this to use as a base for implementing the change.
So I'm not 100% sure why, but my current investigation leads me to this bit of code where InternalState.useObservableQuery will call this.client.watchQuery twice for the same query (can be seen on page load of my repro).
The first one created seems to be abandoned, but remains in the list of query of the QueryManager.
https://github.com/apollographql/apollo-client/blob/116723d88b5600126cb266060ca58ce989d3988e/src/react/hooks/useQuery.ts#L531-L569
So in the end, it seems it's not an issue of query being cleaned up left in the QueryManager. It's a separate query that was never properly registered that is somehow in the list. When resetStore is called, that "never alive" query is revived. So maybe a proper fix would be to avoid this duplicate in the first place since I suspect there might be lots of "uninitialized" queries
That sounds very familiar to https://github.com/apollographql/apollo-client/pull/11837 where I noticed something similar. Are you using React's strict mode by chance? I believe the the issue here was that the strict mode caused useState to run twice, which meant useInternalState was creating 2 instances of InternalState, which ran watchQuery twice: https://github.com/apollographql/apollo-client/blob/116723d88b5600126cb266060ca58ce989d3988e/src/react/hooks/useQuery.ts#L125
Take a look at the PR description in https://github.com/apollographql/apollo-client/pull/11837. Perhaps you might be seeing something similar. Funny enough, the fix for that one was also doing a check for hasObservers 😆
It does look like StrictMode is the offender in the repro above where useRef in useInternalState is empty on both renders causing to create an ObservableQuery twice
It would point to bad cleanup in useInternalState.
However, I'm hitting this issue in production where StrictMode should do nothing. Let me see if I can repro without StrictMode
My biggest fear is that there's a memory leak somehow. I haven't been able to confirm it either in a repro nor in our production application, so I suppose that for now the initial patch should do the trick. I'll try to monitor afterward in case memory leak occurs
As for testing, I managed to write a test that fails without the fix and passes with it, it's not the cleanest approach, but let's discuss this on the PR directly
Thanks for the PR! I'll take a look when I'm back on Monday. Have a great weekend!
@jerelmiller
Someone on my team, @lorenzopicoli , found another related issue.
In one of our mutations, we're using the refetchQueries option
const [mutation] = useMutation<ICreateAssetMutationResp, ICreateAssetMutationArgs>(
createAssetMutation,
{
ignoreResults: true,
refetchQueries: [
// needed because a sub-asset can be created at any level in the sub-assets page
// called if in /assets/:assetId/subs, to refresh the descendants count in treeView
"AssetParentTreeView",
// called if in /assets/:assetId, to refresh the descendants count in sub-asset tree-view
"AssetDetailsQuery",
],
}
)
I went down through the code and ended up in QueryManager.getObservableQueries
https://github.com/Cellule/apollo-client/blob/823b26e1073ae9c50d7c70078ea99e758975dab6/src/core/QueryManager.ts#L852-L900
It seems if include is an array of strings, it'll return queries matching the name even if they're not active.
I feel this is a mistake, but I can't 100% confirm. I'll add a proposed fix in my PR for you to take a look
I dug a bit more into it. From what I can see here's what happen (exacerbated by StrictMode, but possible without)
- useQuery is called
- client.watchQuery is called creating an observableQuery
- React.useEffect() (actually useSyncExternalStore) is called to begin subscribing to the subscription
- Something causes the component to unmount immediately (like StrictMode). The useEffect to start the subscription is never called Apparently this is possible when mounting/unmounting component relatively fast
- Another query mount and goes through the process normally
- InMemoryCache calls
broadcastWatchafter getting results from the "second" query, this causes the first query to becomedirty - And bad state reached
So clearly StrictMode is being a huge offender of this problem. I can see that the useState initializer function runs twice in StrictMode https://github.com/apollographql/apollo-client/blob/80471816396fa023fa2dc3b768ceb3da5b22edf4/src/react/hooks/useQuery.ts#L179-L205
https://react.dev/reference/react/useState#caveats
Since there's no proper "cleanup" for the observable that is created in useState, we always end up with 2 observable for every query.
According to this
https://github.com/facebook/react/issues/20090#issuecomment-715926549
Creating a watcher/listener in useState is an anti-pattern and that's why it causes such problems in StrictMode
I'll try to come up with a better repro without StrictMode, but I do believe we should figure out and fix that issue in StrictMode as well to respect the rules of React
@Cellule thanks for that confirmation. I suspected strict mode was the cause of at least one of the cases and agree we should try and do a better job here to avoid he double initialization. Do let me know if you're able to reproduce it without strict mode though. I'd be curious how that is allowed to happen, but my guess is the underlying cause is still very similar (e.g. multiple ObservableQuery instances are created)
Now I'm starting to wonder if the main issue was fixed (aka dead queries in production)
When I initially encountered this issue we were on apollo-client 3.10.4, we are currently on 3.10.8 and I can't seem to repro
I reviewed the diff between those 2 versions and it does look like something changed around how useQuery handles its state
v3.10.4...v3.10.8
Or more specifically from this PR https://github.com/apollographql/apollo-client/pull/11851
You probably have more context around that particular change, do you think this solved it ?
If yes, then maybe we just have to focus on fixing the useState(createInternalState) in StrictMode
@Cellule ok thats good news! That change was made to provide better compatibility with the new React Compiler. We were violating a few rules of React that would have made it difficult to integrate with it. Very possible the move away from the ref and into state helps here.
Great, I updated the title of the issue to better represent the issue.