nodejs-firestore
nodejs-firestore copied to clipboard
RST_STREAM error keeps showing up
Environment details
- OS: macOS Catalina 10.15.5 Beta (19F53f)
- Node.js version: 13.7.0
- npm version: 6.13.6
@google-cloud/firestoreversion: 3.7.4
Steps to reproduce
This error keeps appearing over and over in my logs (not regularly reproducible):
Error: 13 INTERNAL: Received RST_STREAM with code 2
at Object.callErrorFromStatus (/api/node_modules/@grpc/grpc-js/src/call.ts:81)
at Object.onReceiveStatus (/api/node_modules/@grpc/grpc-js/src/client.ts:324)
at Object.onReceiveStatus (/api/node_modules/@grpc/grpc-js/src/client-interceptors.ts:439)
at Object.onReceiveStatus (/api/node_modules/@grpc/grpc-js/src/client-interceptors.ts:402)
at Http2CallStream.outputStatus (/api/node_modules/@grpc/grpc-js/src/call-stream.ts:228)
at Http2CallStream.maybeOutputStatus (/api/node_modules/@grpc/grpc-js/src/call-stream.ts:278)
at Http2CallStream.endCall (/api/node_modules/@grpc/grpc-js/src/call-stream.ts:262)
at ClientHttp2Stream.<anonymous> (/api/node_modules/@grpc/grpc-js/src/call-stream.ts:532)
at ClientHttp2Stream.emit (events.js:315)
at ClientHttp2Stream.EventEmitter.emit (domain.js:485)
at emitErrorCloseNT (internal/streams/destroy.js:76)
at processTicksAndRejections (internal/process/task_queues.js:84)
@jakeleventhal Thanks for filing this. Do you know at all what type of request/API call this is related to?
The only place i use grpc is via firestore so it must be something from there. I don't really have more info unfortunately
I also got this one with firebase-admin 8.12.1 :
{"error":"Error: 13 INTERNAL: Received RST_STREAM with code 2
at Object.callErrorFromStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/call.js:30:26)
at Object.onReceiveStatus (/usr/src/app/node_modules/@grpc/grpc-js/build/src/client.js:175:52)
at Object.onReceiveStatus ...
This is a weird jeopardize issue, thrown during delete as create or whatever request type...
All document modifications (e.g. delete(), create()) use the same underlying request type. This request type is not safe for automatic retry, so unfortunately we cannot simply retry these errors.
Hi @schmidt-sebastian.
I do confirm the request type is not safe.
Retrying many times the same call (and so, the same request, I guess) has no significant effect.
But there is no error when the nodejs instance is restarted : I guess a GRPC subroutine is hanging out.
After a while, errors, as mentioned by @jakeleventhal, are thrown and it’s not possible to persist data in firestore and an instance restart is required. That’s a real problem folks 😄 FYI, I’ve no issue using firestore delete(), create() methods with Go...
@skonx To rule out that this is a problem with our backend, can you send me your project ID and the time at which you saw these repeated failures? My email is mrschmidt<at>google.com.
If it is not a backend issue, we likely need GRPC logs from to further diagnose these failures. These can be obtained by setting two environment variables: GRPC_TRACE=all, GRPC_VERBOSITY=DEBUG
Note that this create a lot of logs, so hopefully this is not something we have to look at.
Yep. I've replied to your google.com email address. I will enable the environment variables asap and keep you post.
Hello! We also experience the same error usually coupled with
Error: 8 RESOURCE_EXHAUSTED: Bandwidth exhausted
We noticed that it usually fails all at once and then recovers very quickly.
We are also affected by this error. A month before, it was a sporadic error, maybe once a week, and now we are seeing it many times per day.
I'm also experiencing the issue for over a month now, with a couple hundred to a few thousand errors per day:
- Error: 13 INTERNAL: Received RST_STREAM with code 2
- Error: 14 UNAVAILABLE: Stream refused by server
@schmidt-sebastian I opened a support case if you need a project ID or other information: 23689958.
Same problem here: Received RST_STREAM with code 2
We were able to resolve the Error: 8 RESOURCE_EXHAUSTED: Bandwidth exhausted by upgrading to a nodejs12 runtime on App Engine and updating Electron to the newly released v9 version. So far we have not seen it again in production since the root cause was related to the node js patch:
https://nodejs.org/en/blog/release/v12.14.1/
[9fd6b5e98b] - http2: fix session memory accounting after pausing (Michael Lehenbauer) #30684
After the update, the various other DB related errors have also gone away, including Received RST_STREAM with code 2.
We currently use Electron in a Google App Engine project and work in the standard runtime. Google has not updated its images to include some necessary libs for the new version of Electron. We were able to work around this as Google supports these libs in Puppeteer, so we installed Puppeteer and sent a deploy off to our production server (on a no-promote flag). After doing so, our best guess is that Google rebuilt our server images with the needed libs to run Puppeteer, which in turn allowed us to run the new version of Electron.
I hope this information helps! We spent a lot of time diving into this, so if you have any questions, feel free to respond below.
Our backend team has looked at the projects that were sent to us. There are no indications that your errors correlate with errors on the backend, which makes it likely that this is a client side issue. I will continue to investigate possible root causes.
As of v3.8.5, we are now retrying RunQuery requests that fail with RST_STREAM. If this error shows up again, we can evaluate expanding our retry to other SDK methods as well.
@schmidt-sebastian I've been running on v3.8.5 for the entire day, I still see RST_STREAM.
I checked my code to see for which usage patterns does it occur:
await reference.set()await batch.commit()await reference.update()
Can we please reopen this issue for visibility?
@merlinnot We continue to receive pushback on retrying RST_STREAM/INTERNAL for Commit RPCs. I hope your comment is enough to convince the respective folks that we should retry.
Thank you. If needed, I can provide a histogram or any other details, just let me know.
This is happening to a project I'm working on as well, using nodejs 10 runtime in Google Cloud functions (via firebase). I have four scripts that fail at least once a day - they make await reference.set() calls to firestore as the script executes. At the moment I don't have any retry logic. The scripts run every two minutes. They make at most 20 set calls.
I was able to reproduce this with GRPC trace logging enabled. I hope this will allow us to make progress here:
2020-06-12T21:42:31.665Z | call_stream | [1518132] attachHttp2Stream from subchannel 172.217.214.95:443
2020-06-12T21:42:31.665Z | subchannel | 172.217.214.95:443 callRefcount 99 -> 100
2020-06-12T21:42:31.665Z | call_stream | [1518132] sending data chunk of length 130 (deferred)
2020-06-12T21:42:31.665Z | call_stream | [1518132] calling end() on HTTP/2 stream
Firestore (3.8.1) 2020-06-12T21:42:31.714Z bH7ni [WriteBatch.commit]: Sending 1 writes
Firestore (3.8.1) 2020-06-12T21:42:31.714Z bH7ni [ClientPool.acquire]: Creating a new client
Firestore (3.8.1) 2020-06-12T21:42:32.284Z ##### [Firestore]: Initialized Firestore GAPIC Client
Firestore (3.8.1) 2020-06-12T21:42:32.284Z bH7ni [Firestore.request]: Sending request: {"database":"projects/xyz/databases/(default)","writes":[{"update":{"name":"projects/xyz/database
s/(default)/documents/a/DpuiiezGapQVLrrdqFfo","fields":{}},"currentDocument":{"exists":false}}]}
Firestore (3.8.1) 2020-06-12T21:42:32.548Z G31QV [WriteBatch.commit]: Sending 1 writes
Firestore (3.8.1) 2020-06-12T21:42:32.548Z G31QV [ClientPool.acquire]: Re-using existing client with 99 remaining operations
Firestore (3.8.1) 2020-06-12T21:42:32.548Z G31QV [Firestore.request]: Sending request: {"database":"projects/xyz/databases/(default)","writes":[{"update":{"name":"projects/xyz/database
s/(default)/documents/a/Ua5mu1wRhNRExZLrX6K5","fields":{}},"currentDocument":{"exists":false}}]}
Firestore (3.8.1) 2020-06-12T21:42:32.599Z YsE7h [WriteBatch.commit]: Sending 1 writes
Firestore (3.8.1) 2020-06-12T21:42:32.599Z YsE7h [ClientPool.acquire]: Re-using existing client with 98 remaining operations
Firestore (3.8.1) 2020-06-12T21:42:32.599Z YsE7h [Firestore.request]: Sending request: {"database":"projects/xyz/databases/(default)","writes":[{"update":{"name":"projects/xyz/database
s/(default)/documents/a/EqrauWpiU7wXdJWnFJyb","fields":{}},"currentDocument":{"exists":false}}]}
...
020-06-12T21:42:33.833Z | call_stream | [1518033] HTTP/2 stream closed with code 2
2020-06-12T21:42:33.833Z | call_stream | [1518033] ended with status: code=13 details="Received RST_STREAM with code 2"
2020-06-12T21:42:33.833Z | subchannel | 172.217.214.95:443 callRefcount 100 -> 99
2020-06-12T21:42:33.833Z | call_stream | [1518034] HTTP/2 stream closed with code 2
2020-06-12T21:42:33.833Z | call_stream | [1518034] ended with status: code=13 details="Received RST_STREAM with code 2"
2020-06-12T21:42:33.833Z | subchannel | 172.217.214.95:443 callRefcount 99 -> 98
2020-06-12T21:42:33.833Z | call_stream | [1518035] HTTP/2 stream closed with code 2
2020-06-12T21:42:33.833Z | call_stream | [1518035] ended with status: code=13 details="Received RST_STREAM with code 2"
2020-06-12T21:42:33.833Z | subchannel | 172.217.214.95:443 callRefcount 98 -> 97
2020-06-12T21:42:33.833Z | call_stream | [1518036] HTTP/2 stream closed with code 2
2020-06-12T21:42:33.833Z | call_stream | [1518036] ended with status: code=13 details="Received RST_STREAM with code 2"
2020-06-12T21:42:33.833Z | subchannel | 172.217.214.95:443 callRefcount 97 -> 96
...
2020-06-12T21:42:33.838Z | subchannel | 172.217.214.95:443 callRefcount 1 -> 0
2020-06-12T21:42:33.839Z | subchannel | 172.217.214.95:443 connection closed with error read ECONNRESET
2020-06-12T21:42:33.839Z | subchannel | 172.217.214.95:443 READY -> IDLE
2020-06-12T21:42:33.839Z | subchannel | 172.217.214.95:443 refcount 2 -> 1
2020-06-12T21:42:33.839Z | pick_first | READY -> IDLE
2020-06-12T21:42:33.839Z | resolving_load_balancer | dns:firestore.googleapis.com:443 READY -> IDLE
2020-06-12T21:42:33.839Z | connectivity_state | dns:firestore.googleapis.com:443 READY -> IDLE
Firestore (3.8.1) 2020-06-12T21:42:33.842Z mQ1lw [Firestore.request]: Received error: { Error: 13 INTERNAL: Received RST_STREAM with code 2
at Object.callErrorFromStatus (/home/mrschmidt/test/node_modules/@grpc/grpc-js/build/src/call.js:30:26)
at Object.onReceiveStatus (/home/mrschmidt/test/node_modules/@grpc/grpc-js/build/src/client.js:175:52)
at Object.onReceiveStatus (/home/mrschmidt/test/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:341:141)
at Object.onReceiveStatus (/home/mrschmidt/test/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:304:181)
at Http2CallStream.outputStatus (/home/mrschmidt/test/node_modules/@grpc/grpc-js/build/src/call-stream.js:116:74)
at Http2CallStream.maybeOutputStatus (/home/mrschmidt/test/node_modules/@grpc/grpc-js/build/src/call-stream.js:155:22)
at Http2CallStream.endCall (/home/mrschmidt/test/node_modules/@grpc/grpc-js/build/src/call-stream.js:141:18)
at ClientHttp2Stream.stream.on (/home/mrschmidt/test/node_modules/@grpc/grpc-js/build/src/call-stream.js:403:22)
at emitOne (events.js:116:13)
at ClientHttp2Stream.emit (events.js:211:7)
code: 13,
details: 'Received RST_STREAM with code 2',
metadata: Metadata { internalRepr: Map {}, options: {} } }
I got the error once today, maybe it helps:
Error: 13 INTERNAL: Received RST_STREAM with code 2
at Object.callErrorFromStatus (/srv/node_modules/@grpc/grpc-js/build/src/call.js:30:26)
at Object.onReceiveStatus (/srv/node_modules/@grpc/grpc-js/build/src/client.js:175:52)
at Object.onReceiveStatus (/srv/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:341:141)
at Object.onReceiveStatus (/srv/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:304:181)
at Http2CallStream.outputStatus (/srv/node_modules/@grpc/grpc-js/build/src/call-stream.js:115:74)
at Http2CallStream.maybeOutputStatus (/srv/node_modules/@grpc/grpc-js/build/src/call-stream.js:154:22)
at Http2CallStream.endCall (/srv/node_modules/@grpc/grpc-js/build/src/call-stream.js:140:18)
at ClientHttp2Stream.stream.on (/srv/node_modules/@grpc/grpc-js/build/src/call-stream.js:396:22)
at emitOne (events.js:116:13)
at ClientHttp2Stream.emit (events.js:211:7)
at emit (internal/http2/core.js:177:8)
at _combinedTickCallback (internal/process/next_tick.js:142:11)
at process._tickDomainCallback (internal/process/next_tick.js:219:9)
exports.onTicketMessageCreate = functions.firestore
.document('tickets/{ticketId}/messages/{messageId}')
.onCreate(async (snap, context) => {
const message = snap.data();
await snap.ref.parent.parent?.update({
messages_count: increment,
'dates.last_message': message?.dates.created,
});
});
Might be a problem of grpc/grpc-js as I don't use nodejs-firestore and also face this error once in a while. It happens with stale connections
I'm also seeing very intermittent Received RST_STREAM with code 2, which in my case is followed by an endless pile of Failed to start HTTP/2 stream and Error: Exceeded maximum number of retries allowed. until the server process is terminated. I'm running nodejs 14 in a GKE pod. I'm not sure why the client can't automatically recover; it feels gross to set up a livenessprobe to kill the process just because of this issue.
I filed b/160000476 (Google internal). I do hope that we can get some traction.
@schmidt-sebastian Do you have some updates on the issue? On my side, just regular outages, nothing new 😂
@schmidt-sebastian Do you have some updates on the issue? On my side, just regular outages, nothing new 😂
Same...
Had the same issue today, Received RST_STREAM with code 2 followed by Failed to start HTTP/2 stream.
Running on GKE, we had to recycle the pod to fix the outage
@google-cloud/firestore version 4.0.0
Ran into the same Error: 13 INTERNAL: Received RST_STREAM with code 2
Running Node.js 10 in Google Cloud Functions. Was trying to delete a document from firestore. If it helps, the function is cron called via http. The only thing in the document was an array containing fifty elements. I have redeployed to Node.js 8 environment and have not seen the issue since. Has anyone else tried different node versions and experienced any success?
@JustinBeckwith @murgatroid99 I saw your involvement in https://github.com/nodejs/help/issues/2105, which seems related at a first glance. Thought I'll tag you here in case it might be somewhat useful, especially if it's an issue with Node.js or @grpc/grpc-js.
I did more research on this topic with trace logs enabled for gRPC channels (don't blame me for killing Google Cloud Logging 😉).
What I see in my logs is:
-
First, I see an abnormally high number of
Pick result: QUEUE subchannel: undefined status: undefined undefined. Regularly, when the application is healthy, I usually see that next todns:firestore.googleapis.com:443 createCall [251818] method="/google.firestore.v1.Firestore/Listen". Please see the two screenshots below, the first one is "regular" (application works fine) the second one is right before further errors happen.
-
Then I see an error thrown a couple of times:
Error: 13 INTERNAL: Received RST_STREAM with code 2. FYI, "disconnecting" on the screenshot is my application's log statement, not related to the issue.
-
Then I can still see a couple (not a lot, maybe 20?) logs without errors:
-
Then I can see failures due to the session being destroyed,
Failed to start call on picked subchanel 216.58.205.42:443 with error The session has been destroyed. Ending call. That error is logged ~200 times.
-
Next, it goes back to
Pick result: QUEUE subchannel: undefined status: undefined undefinednext tocreateCall.
-
Then again
Pick result: QUEUE subchannel: undefined status: undefined undefined, but this time with nocreateCall. That happens a couple hundred times.
-
I can then see some
Pick result: COMPLETE subchannel: 216.58.198.170:443 status: undefined undefined, maybe a few dozen.
-
It then starts happening next to
createCall:
-
Next, it starts going between
QUEUE/COMPLETE:
-
And then the worst starts,
Failed to start call on picked subchannel 216.58.204.42:443 with error New streams cannot be created after receiving a GOAWAY. Retrying pick. It goes on forever until the node is restarted.
For any Googler who'd like to examine the logs themselves:
- Support case ID: 23689958.
- Logging query:
(resource.type="gce_instance" AND resource.labels.instance_id="1029783437173675621") OR (resource.type="global" AND jsonPayload.instance.id="1029783437173675621"). - Timestamp where it all starts:
2020-07-02T06:28:41, you can also see timestamps in my screenshots above.