firebase-js-sdk icon indicating copy to clipboard operation
firebase-js-sdk copied to clipboard

Hanging query for Firestore

Open thomasdao opened this issue 1 year ago • 28 comments

Operating System

Both Mac and Windows

Browser Version

Chrome, Electron Browser window

Firebase SDK Version

10.7.0, 10.7.1

Firebase SDK Product:

Firestore

Describe your project's tooling

Plain Electron app

Describe the problem

This is the new ticket for hanging query issue, follow up from https://github.com/firebase/firebase-js-sdk/pull/7771 and https://github.com/firebase/firebase-js-sdk/issues/7652

When update Firebase to 10.7.0 and 10.7.1, the query becomes a lot slower and frequently stuck with error below:

@firebase/firestore: Firestore (10.7.0): WebChannelConnection RPC 'Listen' stream 0x5b9a037f transport errored: Wn {type: 'c', target: Hn, g: Hn, defaultPrevented: false, status: 1}

Switch back to 10.6.0 and the query completes quickly.

Steps and code to reproduce issue

I've created a minimal sample to reproduce this issue and have shared with @MarkDuckworth, if you need to get access to the private repo, please let me know, thank you!

thomasdao avatar Dec 12 '23 10:12 thomasdao

Thanks for reporting @thomasdao. I'll try to reproduce it

ehsannas avatar Dec 12 '23 21:12 ehsannas

@ehsannas thanks, I've invited you to the sample project :)

thomasdao avatar Dec 13 '23 01:12 thomasdao

Thanks @thomasdao . I am able to see the error in the logs from your repo. I do, however, see that each such log message is followed by an UNAVAILABLE code from the backend. Which means it's a legitimate error returned from the backend to the SDK. It's plausible that the newer WebChannel version has become much more efficient at sending parallel requests to the backend such that you're hitting a certain limit of request rate for a single client. This error code is retryable with a backoff, which means the SDK will recover and rerun the query after some delay.

Please take a look at: https://firebase.google.com/docs/firestore/real-time_queries_at_scale#understand_high_write_traffic_in_the_system https://firebase.google.com/docs/firestore/best-practices#ramping_up_traffic

ehsannas avatar Dec 14 '23 16:12 ehsannas

@ehsannas I've never seen the UNAVAILABLE code, even if I wait for more than 10 minutes.

I find the reason newer WebChannel version has become much more efficient at sending parallel requests not really logical: the same type of query works with version 10.6.0, which indicates that the server is able to handle that query and the problem is likely with the newer version of the client.

I've tested adding a delay of 1 second between each paginated query to reduce server load, and see the same error @firebase/firestore: Firestore (10.7.0): WebChannelConnection RPC 'Listen' stream 0x269fb953 transport errored: Wn {type: 'c', target: Hn, g: Hn, defaultPrevented: false, status: 1}.

thomasdao avatar Dec 14 '23 23:12 thomasdao

I'm also running into this error. Subscription seems to work fine for a while and then gets dropped with the same RPC 'Listen' stream transport error. Any ideas on what this might be or where to catch the error?

phileasthefogg avatar Feb 05 '24 06:02 phileasthefogg

Same issue after upgrade AngularFire to 17.0.1 which depends on firebase ^10.7.0.

One of our project query becomes slower and run into the @firebase/firestore: Firestore (10.7.2): WebChannelConnection RPC 'Listen' stream error occasionally. The other smaller project works fine.

Tried experimentalForceLongPolling mentioned in #7968 but no luck. downgrade to 10.6.0 seems resolve the issue.

IvanKYW avatar Feb 06 '24 01:02 IvanKYW

I'm also seeing the same issue with hanging snapshot queries for a while, with the same type of WebChannelConnection RPC 'Listen' stream ... transport error.

Sometimes, after failing with the error, the snapshot query retries and returns correct data after a couple of minutes, but most times it just hangs indefinitely. In our case, it only happens with queries that would return a large amount of data (hundreds of docs containing fairly large strings).

The issues started with versions 10.4.x. They were then fixed in versions 10.6.x, but are now back again with 10.7.x. I've also tested the latest 10.8.0, and the issue is still there. As a summary:

  • 10.3.1: issue not present
  • 10.4.x: issue shows up
  • 10.5.x: issue still present
  • 10.6.x: issue fixed
  • 10.7.x / 10.8.0: issue shows up again

Using experimentalForceLongPolling does not seem to make a difference.

I wasn't able to reproduce it in a local or staging environment, as it only seems to show up in our production environment where we have around ~40K snapshot listeners / ~10K active connections, as reported in the Firebase console.

ghinda avatar Feb 07 '24 11:02 ghinda

I'm also running into this error since upgrading to v10.7.0, and much like @phileasthefogg, getting the same RPC 'listen' stream transport error. This is a small project (< 10 active connections at a time), and I'm able to reproduce it in both local and production environments.

MrDavidRios avatar Feb 14 '24 05:02 MrDavidRios

Hi @ehsannas, not sure if you have been able to work on this issue? Maybe @MarkDuckworth can take a look. This issue has prevented us from updating to the latest version. Thank you!

thomasdao avatar Feb 23 '24 03:02 thomasdao

same issue happens for my project (using flutter), in the beginning everything was fine (I've being using firestore for about 6months) but now suddenly getting all the time (maybe data sets grown, due to smaller db size didn't experience it before)

alex-dokienko avatar Feb 24 '24 18:02 alex-dokienko

@MrDavidRios Would you be able to share your project in which you're able to consistently reproduce this issue? (feel free to point me to a github repo). Thanks!

ehsannas avatar Feb 26 '24 19:02 ehsannas

This phenomenon seems to be more likely to occur in a slow network environment. By setting "Fast 3G" or "Slow 3G" in Network of DevTools, we were able to reproduce the phenomenon even in an environment where it does not usually occur.

hiroro-work avatar Mar 03 '24 07:03 hiroro-work

(note to googlers: this may be related to support case b/325591749, which reports similar webchannel issues when the network is throttled)

dconeybe avatar Mar 04 '24 15:03 dconeybe

Same thing happens in our project. Unfortunately I can't downgrade to firebase 10.6.0 (without much effort) because of AngularFire and Angular dependencies. It still happens on firebase 10.9.0 ...

jorgsiegel avatar Mar 27 '24 21:03 jorgsiegel

This issue happened since December last year, affect multiple project but did not receive any update. I'm on Blaze plan but cannot update the library to the latest version and it's really frustrating. Could you please share if any of you are investigating this issue? Thank you! @MarkDuckworth @dconeybe @ehsannas

thomasdao avatar Apr 04 '24 02:04 thomasdao

@thomasdao, I'll touch base with the team and see if I can move this forward.

MarkDuckworth avatar Apr 04 '24 02:04 MarkDuckworth

This problem affects users in our production apps. We are also in the middle of developing a new app and can consistently reproduce the error. It seems to be connected to the size of Firestore documents. Our documents are max. 300,000 bytes, which is far below the limit specified on the official Firestore documentation page (1 MiB / 1,048,576 bytes) and we are fetching max. 40 documents in a single query.

We would highly appreciate if the Firebase team could check what changed in recent versions and fix it soon.

jorgsiegel avatar Apr 04 '24 13:04 jorgsiegel

Thank you @MarkDuckworth.

Just to second @thomasdao & @jorgsiegel, this has long been a part of the stable releases and effects our users. For various reasons we are unable to downgrade. We have a long living gcp ticket open regarding this. I have a feeling this happens more often the bigger the result set is. We run an SPA, where we stream about 5000 documents. All well in the region of 1KB. When the queries fail they restart over and over. Resulting in the client downloading 100MB what should be 5MB. We have no workaround for this.

Would really appreciate to see some progress here.

Valansch avatar Apr 04 '24 13:04 Valansch

We're also encountering this issue (running 10.8)

Tried 10.11 and it's still happening, but as suggested above downgrading to 10.6 fixed it

valeriangalliat avatar Apr 12 '24 01:04 valeriangalliat

I have a potential fix for this issue. Would anyone be willing/able to test it out? The fix is in https://github.com/firebase/firebase-js-sdk/pull/8145 (NOTE: it is still a work-in-progress). Please comment on the PR with the outcome of your experiment (rather than commenting here on the issue).

You will need to build the firestore sdk for yourself, but, thankfully, it's relatively straight forward.

  1. npm install -g yarn
  2. git clone --depth 100 https://github.com/firebase/firebase-js-sdk.git (if using an existing clone of this repo, make sure you're at a commit that includes #8145) ~git clone -b dconeybe/WebChannelOnOpenFix_Bug325591749 --depth 100 https://github.com/firebase/firebase-js-sdk.git~
  3. cd firebase-js-sdk
  4. yarn
  5. yarn build
  6. cd packages/firestore
  7. yarn build:debug
  8. cp -r dist ~/YOUR_PROJECT/node_modules/@firebase/firestore
  9. rebuild your project and test it out

Note that the --depth 100 argument to git is just an optimization to pull about 8MB instead of 30MB. Feel free to omit that argument.

Note that the extra yarn build:debug command is optional, and produces Firestore's index.esm2017.js with all of the code mangling, code stripping, and optimizations disabled. This will produce more readable compiled code and stack traces without mangled names that are much easier to make sense of.

The "cp" command will copy the compiled Firestore JavaScript bundles into your own project's node_modules directory, clobbering the ones that npm downloaded. Make sure to restore the production version (e.g. by deleting the node_modules directory and re-running npm install) when done testing out this fix.

dconeybe avatar Apr 12 '24 20:04 dconeybe

@thomasdao, I have a branch (markduckworth/debug-webchannel-stat-events) that will log additional events from WebChannel. This logging is showing some useful additional info before a WebChannelConnnection transport error on my device.

Can you test with this branch on your local reproduction and provide me with any log statements for "STAT_EVENT". If these events are before the WebChannelConnection RPC 'Listen' stream 0x269fb953 transport errored event, please include those log lines too.

Your help is greatly appreciated.

MarkDuckworth avatar Apr 22 '24 21:04 MarkDuckworth

@MarkDuckworth I check out your branch and follow the instruction from https://github.com/firebase/firebase-js-sdk/issues/7860#issuecomment-2052471034. Please see the log attached, thanks!

firebase_log.txt

thomasdao avatar Apr 22 '24 23:04 thomasdao

Thanks @thomasdao.

In my local tests, when I see WebChannelConnection RPC 'Listen' stream X transport errored: ..., the STAT_EVENT logging shows that the root cause was expected/normal. Furthermore I saw the SDK recover gracefully.

In your logs, the STAT_EVENTs leading up to the WebChannelConnection error are different. I'm trying to understand why. The repro that you previously shared with me is not currently reproducing this error. Does that shared repo still reproduce the issue for you?

MarkDuckworth avatar Apr 23 '24 17:04 MarkDuckworth

Also @thomasdao, can you provide the Firebase project ID you used when creating firebase_log.txt? Is it the same project ID from your shared repro? We want to review server logs.

MarkDuckworth avatar Apr 23 '24 21:04 MarkDuckworth

@MarkDuckworth

The repro that you previously shared with me is not currently reproducing this error. Does that shared repo still reproduce the issue for you?

Yes, I can still reproduce this issue. Sometimes the query can complete, but the next time I run it again, the query would hang.

Is it the same project ID from your shared repro?

Yes it's the same project ID.

thomasdao avatar Apr 24 '24 08:04 thomasdao

Version 10.11.1 was released today and rolls back the WebChannel config to be equivalent to the 10.6 (and 10.5.2) releases. I have tested with @thomasdao's reproduction and I'm seeing the queries complete consistently and quickly. Errors WebChannelConnection RPC 'Listen' stream 0x269fb953 transport errored: Wn {type: 'c', target: Hn, g: Hn, defaultPrevented: false, status: 1} were not observed.

MarkDuckworth avatar Apr 25 '24 21:04 MarkDuckworth

@MarkDuckworth thank you, I tried 10.11.1 and found the query can complete quickly.

Just curious, is WebChannel really superior to the FetchXmlHttpFactory? What's the problem with FetchXmlHttpFactory?

thomasdao avatar Apr 26 '24 04:04 thomasdao

Friends, It is already fixed by firebase team in the newest Version 10.11.1 - April 25, 2024

Cloud Firestore Prevent spurious "Backend didn't respond within 10 seconds" errors when network is in fact responding, but slowly. See GitHub PR #8145. https://firebase.google.com/support/release-notes/js

IslamElKassas avatar Apr 27 '24 09:04 IslamElKassas

Wed 14 Aug 2024 - Still happening in "firebase": "^10.12.5". This issue is constant to the point where firebase (therefore the app) is completely unusable. Downgrading to 10.6 did not fix the issue nor using;

experimentalForceLongPolling: true, useFetchStreams: false,

Can somebody share some light on what's going on with this issue, is it even being addressed? It's never been a problem and I've been using firbease for 3 / 4 years now.

thesoicalapp91 avatar Aug 14 '24 22:08 thesoicalapp91