firebase-js-sdk
firebase-js-sdk copied to clipboard
[Web] Support Grpc configuration in Web Client SDK
Operating System
Debian GNU/Linux 11 (bullseye) Linux 6.1.21-v8+
Browser Version
Node 16.20.0
Firebase SDK Version
10.8.1
Firebase SDK Product:
Auth, Firestore
Describe your project's tooling
esbuild ^0.19.5
Describe the problem
The Firebase JavaScript SDK when running in a Node.js environment (may also affect others) takes in excess of 15 minutes to detect a silently broken network connection (e.g. NAT entry erased or traffic blocking being applied) preventing Firestore listeners on the device from receiving updates and writes from reaching the server.
Steps and code to reproduce issue
Recommended: Set environment flags GRPC_VERBOSITY=debug GRPC_TRACE=all
for better observation of gRPC activity.
Launch Node program with at least one firestore listener.
Block internet traffic at the router or apply firewall rules blocking internet traffic to/from your device or over the specific connection to firestore backend to simulate silently killed network connection.
Observe how long it takes for a message such as the following to appear:
[2024-02-29T09:50:17.531Z] @firebase/firestore: Firestore (10.3.1): GrpcConnection RPC 'Listen' stream 0x6ba5ef16 error. Code: 14 Message: 14 UNAVAILABLE: read ETIMEDOUT
[2024-02-29T09:50:17.531Z] @firebase/firestore: Firestore (10.3.1): GrpcConnection RPC 'Listen' stream 0x40aaf17c error. Code: 14 Message: 14 UNAVAILABLE: read ECONNRESET
Or from gRPC:
D 2024-02-29T10:50:17.497Z | subchannel_call | [7] Node error event: message=read ECONNRESET code=ECONNRESET errno=Unknown system error -104 syscall=read
It appears, based on tcpdump, that the Firestore client doesn't even send keepalives to the backend server. The server sends keepalives every 45 seconds, but the client is either unable to or not configured to monitor for a certain number of missed keepalives before trying to reconnect.
In my test case, it took 32 minutes before the ECONNRESET was triggered by Node leading gRPC to start trying to reconnect.
Once the block is removed and the device is allowed to reconnect, even once the gRPC keepalives started flowing again, my Firestore listeners never started working again.
Hi @delaneyb ,
If I understand correctly, there are two issues you described in the ticket:
- When network connection is broken, it takes too long for SDK to show error messages like the following:
@firebase/firestore: Firestore (10.3.1): GrpcConnection RPC 'Listen' stream 0x6ba5ef16 error. Code: 14 Message: 14 UNAVAILABLE: read ETIMEDOUT
- When network is back, Firestore listeners (onSnapshot) cannot receive latest snapshot.
Please let me know if I miss summarize anything.
Hi @cherylEnkidu,
That is correct.
In regards to 2., note the onError
and onCompletion
callbacks for the onSnapshot listener do not get called, so it is reasonable to expect it self-recovers and continues working once the network connection is reestablished.
Also, it is important to distinguish between scenarios where the socket is closed and the application is notified of this, and the connection just silently going dead due to some intervening infrastructure or software:
- If I use gdb to close the connection as if it was closed from the other end (leading to
@firebase/firestore: Firestore (10.8.1): GrpcConnection RPC 'Listen' stream 0x66551d64 error. Code: 1 Message: 1 CANCELLED: Call cancelled
, the SDK or gRPC seems to immediately reestablish a new connection and everything continues functioning as normal (sudo gdb -p $(pgrep node)
and then usecall (int)shutdown(46, 0)
followed byc
to resume the program, replacing 46 with the fd of the connection to firestore backend which can be found viasudo lsof -nP -iTCP:443 -a -c node
) - If we instead add a firewall rule on the router causing it to start dropping packets to and from the firestore backend whilst the socket remains open and functioning as far as the SDK is concerned this is where the problems occur. In this scenario I observed it taking 32 minutes just for node to fire the ECONNRESET, and we are dealing with a situation we believe is being caused by the client's firewall where the code we are getting is instead ETIMEDOUT, but again the point being again we have confirmed instances of this occurring as long as 15 minutes after the connection actually stopped working.
Hi @delaneyb ,
I consult our grpc team, their suggestions as the following:
Sometimes the client doesn't see connections drop, for whatever reason. The gRPC keepalive functionality can help here. It can be configured using the client construction options grpc.keepalive_time_ms, grpc.keepalive_timeout_ms, grpc.keepalive_permit_without_calls.
When configured, the client will wait an amount of time equal to the grpc.keepalive_time_ms parameter, then send a ping. If it doesn't get a response within the grpc.keepalive_timeout_ms, it will consider the connection closed. If grpc.keepalive_permit_without_calls is set to 1, it will do this even if there are no streams active.
Hi @cherylEnkidu,
I have found related issues https://github.com/googleapis/nodejs-firestore/issues/791 and https://github.com/googleapis/nodejs-firestore/issues/1057 in nodejs-firestore, however firebase/firebase-js-sdk does not seem to expose a new Firestore()
constructor where we can pass in grpc settings.
Using @google-cloud/firestore is not viable because it requires IAM/admin service accounts, which we do not want on devices running the Node.js program.
Hi @delaneyb ,
Unfortunately Web Client SDK doesn't have a way to config grpc settings via Firestore yet. I will make this ticket as a feature request and track the ticket(b/329681553). Thank you for your reporting again!