`onSnapshot` getting out of sync when `useFetchStreams` enabled and brittle network
Operating System
primarily Android, but not always
Environment (if applicable)
primarily on mobile devices, but not always
Firebase SDK Version
11.6.0
Firebase SDK Product(s)
Firestore
Project Tooling
Web app with Webpack
Detailed Problem Description
We're seeing that for some users, onSnapshot for a query does not keep up with changes on the server. It seems to miss a change and it does not catch up later.
Unfortunately we can not reproduce the issue at will. It seems to be some race condition that only happens on brittle internet connections and we do not have a setup to hammer a test with simulated connection issues until we could see it happening.
What we can see in our error tracking is that in many occurrences the browser has the error @firebase/firestore: Firestore (11.6.0): WebChannelConnection RPC 'Listen' stream 0x24eae879 transport errored: [object Object] earlier in its JS console logs.
We also found out that if we pass useFetchStreams: false to initializeFirestore, then the issue goes away completely.
Steps and code to reproduce issue
I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.
Hi @neelance, thank you for reporting this issue. Could you please provide more context on "It seems to miss a change and it does not catch up later."? Does that miss a snapshot, but the next ones are still in sync with the server, or some changes on the server are completely missing?
Since the bug is not reproducible, it is hard to debug. Maybe we can extract some more context out of the @firebase/firestore: Firestore (11.6.0): WebChannelConnection RPC 'Listen' stream 0x24eae879 transport errored: [object Object] error message. Could you please try using a custom build from this branch?
Does that miss a snapshot, but the next ones are still in sync with the server, or some changes on the server are completely missing?
I can't really say. What we are seeing is that at a certain point we know in the frontend that the backend just wrote to a certain document (after processing a purchase). We added some additional code to log an error if this change did not become visible in the frontend after 30 seconds. This is the most clear indication of the bug that we are seeing (before this additional error logging, we only saw strange business logic states that "should not happen").
Additionally when using getDocsFromServer in such a situation, we still get the old documents even we are sure that the document got written. This is because getDocsFromServer does not really fetch again from the server if there is an active onSnapshot binding on the same query. Then Firestore seems to assume that it already knows about the latest data, so it does not query again (we were able confirm this behavior via local testing). But with this bug, it is not really the latest data, even after doing some other Firestore actions in the meantime.
Just to mention it again: Setting useFetchStreams: false resolves our issue, so it is unlikely that it is a bug in our own code.
Could you please try using a custom build from this https://github.com/firebase/firebase-js-sdk/pull/8907?
As I can only reproduce this in production, it is not easy to push a custom build into our CI pipeline. What I could do instead is to wait until https://github.com/firebase/firebase-js-sdk/pull/8907 landed in a proper release and then temporarily set useFetchStreams: true to capture a new error message from production.
getDocsFromServer sharing the existing stream is an intended behaviour. The underlying bug is still the real time listener missing changes from backend.
The #8907 is merged today, I will update the thread once it is released.
Would it be possible to set the log level to "debug" and collect the logs for same process when useFetchStreams is true/false. With debug level logging, we should be able to check what we are receiving from the server, and compare the differences.
It would be appreciated if you could provide a minimal repro app, so that we can debug it on our side.
It would be appreciated if you could provide a minimal repro app, so that we can debug it on our side.
I'd love to, but as mentioned earlier it is not easy to come up with a test setup:
Unfortunately we can not reproduce the issue at will. It seems to be some race condition that only happens on brittle internet connections and we do not have a setup to hammer a test with simulated connection issues until we could see it happening.
Any ideas?
@neelance
Do you turn on multitab support in your app?
Do you turn on multitab support in your app?
No. We are not using any persistence feature.
Hi @neelance, could you please try upgrading your SDK version to V11.6.1 or higher, and collect the error message again?
Hey @neelance. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.
If you have more information that will help us get to the bottom of this, just add a comment!
@milaGGL I just pushed a change to our codebase to temporarily set useFetchStreams: true. I will now monitor our logging to see when the issue happens again and to catch the new error message.
Setting useFetchStreams: true made the issues return. I looked into a few logs and the Firestore connection error always looks like this:
@firebase/firestore: Firestore (11.8.0): WebChannelConnection RPC 'Listen' stream 0x2f4f2fdd transport errored. Name: undefined Message: undefined
Unfortunately that's not much more helpful than the previous [object Object].
@milaGGL What should we try next?
@milaGGL Could you please take another look? 🙏
@neelance, sorry for the trouble this is causing you. @milaGGL is not available so I'm going to take this one.
I read though the issue to get up to speed. Given what we know, it still seems like the best path forward is to try to get more information out of this webchannel error. I opened a PR that should do that. Once we get that merged, hopefully we can move this forward.