firebase-js-sdk icon indicating copy to clipboard operation
firebase-js-sdk copied to clipboard

`onSnapshot` getting out of sync when `useFetchStreams` enabled and brittle network

Open neelance opened this issue 8 months ago • 10 comments

Operating System

primarily Android, but not always

Environment (if applicable)

primarily on mobile devices, but not always

Firebase SDK Version

11.6.0

Firebase SDK Product(s)

Firestore

Project Tooling

Web app with Webpack

Detailed Problem Description

We're seeing that for some users, onSnapshot for a query does not keep up with changes on the server. It seems to miss a change and it does not catch up later.

Unfortunately we can not reproduce the issue at will. It seems to be some race condition that only happens on brittle internet connections and we do not have a setup to hammer a test with simulated connection issues until we could see it happening.

What we can see in our error tracking is that in many occurrences the browser has the error @firebase/firestore: Firestore (11.6.0): WebChannelConnection RPC 'Listen' stream 0x24eae879 transport errored: [object Object] earlier in its JS console logs.

We also found out that if we pass useFetchStreams: false to initializeFirestore, then the issue goes away completely.

Steps and code to reproduce issue

neelance avatar Apr 08 '25 08:04 neelance

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

google-oss-bot avatar Apr 08 '25 08:04 google-oss-bot

Hi @neelance, thank you for reporting this issue. Could you please provide more context on "It seems to miss a change and it does not catch up later."? Does that miss a snapshot, but the next ones are still in sync with the server, or some changes on the server are completely missing?

Since the bug is not reproducible, it is hard to debug. Maybe we can extract some more context out of the @firebase/firestore: Firestore (11.6.0): WebChannelConnection RPC 'Listen' stream 0x24eae879 transport errored: [object Object] error message. Could you please try using a custom build from this branch?

milaGGL avatar Apr 08 '25 19:04 milaGGL

Does that miss a snapshot, but the next ones are still in sync with the server, or some changes on the server are completely missing?

I can't really say. What we are seeing is that at a certain point we know in the frontend that the backend just wrote to a certain document (after processing a purchase). We added some additional code to log an error if this change did not become visible in the frontend after 30 seconds. This is the most clear indication of the bug that we are seeing (before this additional error logging, we only saw strange business logic states that "should not happen").

Additionally when using getDocsFromServer in such a situation, we still get the old documents even we are sure that the document got written. This is because getDocsFromServer does not really fetch again from the server if there is an active onSnapshot binding on the same query. Then Firestore seems to assume that it already knows about the latest data, so it does not query again (we were able confirm this behavior via local testing). But with this bug, it is not really the latest data, even after doing some other Firestore actions in the meantime.

Just to mention it again: Setting useFetchStreams: false resolves our issue, so it is unlikely that it is a bug in our own code.

Could you please try using a custom build from this https://github.com/firebase/firebase-js-sdk/pull/8907?

As I can only reproduce this in production, it is not easy to push a custom build into our CI pipeline. What I could do instead is to wait until https://github.com/firebase/firebase-js-sdk/pull/8907 landed in a proper release and then temporarily set useFetchStreams: true to capture a new error message from production.

neelance avatar Apr 09 '25 08:04 neelance

getDocsFromServer sharing the existing stream is an intended behaviour. The underlying bug is still the real time listener missing changes from backend.

The #8907 is merged today, I will update the thread once it is released.

Would it be possible to set the log level to "debug" and collect the logs for same process when useFetchStreams is true/false. With debug level logging, we should be able to check what we are receiving from the server, and compare the differences.

It would be appreciated if you could provide a minimal repro app, so that we can debug it on our side.

milaGGL avatar Apr 09 '25 15:04 milaGGL

It would be appreciated if you could provide a minimal repro app, so that we can debug it on our side.

I'd love to, but as mentioned earlier it is not easy to come up with a test setup:

Unfortunately we can not reproduce the issue at will. It seems to be some race condition that only happens on brittle internet connections and we do not have a setup to hammer a test with simulated connection issues until we could see it happening.

Any ideas?

neelance avatar Apr 10 '25 08:04 neelance

@neelance

Do you turn on multitab support in your app?

wu-hui avatar Apr 14 '25 17:04 wu-hui

Do you turn on multitab support in your app?

No. We are not using any persistence feature.

neelance avatar Apr 14 '25 19:04 neelance

Hi @neelance, could you please try upgrading your SDK version to V11.6.1 or higher, and collect the error message again?

milaGGL avatar Jun 04 '25 19:06 milaGGL

Hey @neelance. We need more information to resolve this issue but there hasn't been an update in 5 weekdays. I'm marking the issue as stale and if there are no new updates in the next 5 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

google-oss-bot avatar Jun 11 '25 01:06 google-oss-bot

@milaGGL I just pushed a change to our codebase to temporarily set useFetchStreams: true. I will now monitor our logging to see when the issue happens again and to catch the new error message.

neelance avatar Jun 15 '25 16:06 neelance

Setting useFetchStreams: true made the issues return. I looked into a few logs and the Firestore connection error always looks like this:

@firebase/firestore: Firestore (11.8.0): WebChannelConnection RPC 'Listen' stream 0x2f4f2fdd transport errored. Name: undefined Message: undefined

Unfortunately that's not much more helpful than the previous [object Object].

@milaGGL What should we try next?

neelance avatar Jun 20 '25 15:06 neelance

@milaGGL Could you please take another look? 🙏

neelance avatar Jul 22 '25 15:07 neelance

@neelance, sorry for the trouble this is causing you. @milaGGL is not available so I'm going to take this one.

I read though the issue to get up to speed. Given what we know, it still seems like the best path forward is to try to get more information out of this webchannel error. I opened a PR that should do that. Once we get that merged, hopefully we can move this forward.

MarkDuckworth avatar Jul 23 '25 23:07 MarkDuckworth