horizon
horizon copied to clipboard
Managing reconnection
@Tryneus brought up a question of what to do if there are disconnections. There are two types of disconnects:
- The Fusion client (browser) disconnects from the Fusion server
- The Fusion server disconnects from RethinkDB
How do we handle that?
My initial impulse is that in either of those cases, the client-side app should be made aware of disconnections/reconnections, but all event setup should be automatically maintained/recreated if possible (otherwise we're pushing unnecessary boilerplate version on the user).
So:
var ref = new Fusion('localhost');
ref.on('connected', ()=>...)
.on('disconnected', ()=>...);
However, any feeds the client has created should be maintained and should pick up where they left off. This may be very challenging technically until we add restartable feeds (and even after that, it'll be very hard until we add full-blown restartable feeds), so for now we might want to also issue an "emergency" event in case the feeds have been lost irrecoverably:
// Find better name than `all_is_lost`
var ref = new Fusion('localhost');
ref.on('all_is_lost', ()=>{refresh the whole page});
Also, /cc @mlucy
Why would the client need to know if the server loses connection to the database momentarily?
Why would the client need to know if the server loses connection to the database momentarily?
Because the client's data might be stale; they need to know that to indicate it to the end-user in the UI.
I think for the moment we should do the easy thing where we let changes back up on the server in a way where clients can reconnect, and if we accumulate more than 100,000 (or whatever the user configures) then we abort the feed and they have to start over. That should handle both types of disconnects in a way that works 99% of the time.
I think disconnects between server and db are too low level, especially if we don't actually expect the client to do anything other than advise the user. As far as the user is concerned the server and the db are the same thing, we should just have one mechanism for the saying the backend died, and that's probably just closing the websocket. If the server wants to wait a bit for the db to come back before closing the socket, that's a server detail.
As far as the user is concerned the server and the db are the same thing, we should just have one mechanism for the saying the backend died.
Completely agree. I didn't imply that we should emit different events in the client library depending on whether the Fusion server or RethinkDB server die; we should just issue an onDisconnected event in either case.
I was talking to @deontologician about an edge case where the user registers the callbacks after the Fusion client already connected to the Fusion server.
I think in this case it's important we still trigger the connected event, so the user's code to manage connects/disconnects works regardless of when they register the callback.
Should there be some sort of timeout for the Fusion to RethinkDB disconnection case where after maybe 5 - 10 seconds, it does trigger the disconnected event? It seems like Fusion should handle temporary disconnects, but not indefinitely.
(For clarification, the client can connect to Fusion just fine, but Fusion is partitioned from RethinkDB @coffeemug )
@dalanmiller -- do you mean Fusion server to RethinkDB server? I'm not sure which piece disconnects from which in this scenario.
I think for the moment we should do the easy thing where we let changes back up on the server in a way where clients can reconnect, and if we accumulate more than 100,000 (or whatever the user configures) then we abort the feed and they have to start over.
I think this makes sense. So the cases that could result in a disconnected event would be:
- RethinkDB database disconnects from Fusion server (at least until changefeeds are restartable)
- Client to Fusion web socket is disconnected for more than (say) 10 seconds
- The web socket is disconnected and Fusion receives over 100k events and starts dropping events. 5 seconds later the web socket reconnects, but the buffer has already blown.
We went through a similar thought processes to what I'm seeing in here. We wired up re-frame apps to a rethinkdb database via a proxy.
We wanted an FRP/reactive stack from the database right through to the GUI (think Meteor). Our proxy was pretty thin, not doing much more than authentication (via IP whitelisting) and the client apps (browser) could issue full rethinkdb queries and throw them down the wire. So a bit different to fusion, but similar in terms of there being three parts (app / proxy / database) with long-lasting sockets between them that might fail..
On the client side, we ended up with this arrangement:
- app (browser client) code issues queries using a client library.
- for each query made, this client library returns a "Signal" to the app code (I mean Signal in the FRP sense - a value which changes over time - an Observable in Rx terms)
- the value within the Signal (Observable) has two fields: 1. a connection status 2. the collection (the query result)
- via this Signal, the app code receives a stream of updates about BOTH the "connection status" and the collection/query itself.
- in response to a "connection status" of "offline", app code might choose to show an overlay saying "offline" or display "trying to get connection back" or perhaps it disables a button, or whatever makes sense for that app. (Or it completely ignores connection status).
- if the app code sees a query result of, say, "nil", then it knows the query is still being performed and will handle that visually, by perhaps showing "Loading ..." or perhaps it will be showing a GUI twirly thing, etc. Again, the visual representation is up to the app.
- of course, when the query result arrives, the app code will see the change, via the Signal and then display as it should. And, as new values arrive, again the app code handles that in whatever way it wants.
- if the websocket is lost, then it is up to the client library to attempt a reconnect, We didn't want that code in the app code. The app is told about the issue via a change to the "connection status" field in the query Signal.
- so the the client library, which manages the websocket, is responsible for updating Signals it previously returned with both new connection status information AND new collection values as they arrive.
Although it was invisible to the app code, our client library had to maintain a small FSM per subscription. There's no getting away from small, stateful machines when there's stateful things, like websocket connections, and retries, even if you are trying hard to be FRP/subscriptions all the way down.
@mike-thompson-day8 -- thanks for the writeup -- that's really helpful!
I'll hitch this one to the wagon of #104 since we may get reconnections for free with that. Putting in subsequent
You might enjoy the WebSockHop client library: https://github.com/fanout/websockhop
It speaks pure WebSocket protocol (nothing special needed on the server) and handles automatic reconnect as well as message timeouts.
Oh, I like that, seems very small and useful. We're considering engine.io as well
From @josephg in https://github.com/rethinkdb/horizon/issues/413#issuecomment-220224309
how I've implemented stuff like this in the past is for a reconnecting client to tell the server "I have subscribed to these queries, and am up to date at version X". Then the server replies with either "Here are changes A,B,C to bring you up to date" or "You are too far out of date. Here's a fresh copy of the query result set".
I'll expand on that a little. Horizon shouldn't get a library to magic away reconnections.
Some facts:
- The open network connection is stateful. The state associated with the connection is the jwt token, any active subscriptions and pending messages (if any).
- This state is managed by a server. If the client subscribes to more queries, we need to open more rethinkdb queries. If the client goes away, the rethinkdb queries need to be closed.
If you're using a library which automatically reconnects the client when it comes back, how does that work? Are you putting the client state in a database somewhere? Whats the TTL on that database? By what mechanism does the server re-open queries to rethinkdb? Does it keep them open forever 'just in case'? What about if the server restarts?
No matter what happens you'll need horizon-specific code on the server to handle reconnections (resubscribing to the queries that the client has subscribed to). You can either replicate the client-specific state (what queries, what versions they saw last) in another database, or the client can just tell you that state when it reconnects. The latter is faster and simpler and has fewer moving parts.
Does the client already re-subscribe to the same feeds when it reconnects? It seems like everything after that is just optimization.
@download13 the client doesn't resubscribe automatically, you have to recreate everything. They really should resubscribe though
Hello guys. Some question regarding the client reconnection.
Do I need to create a new instance of the Horizon in case of reconnection attempts?
What is the better way to destroy just disconnected instance with all subscriptions (I guess they all safety clear according to RxJS docs)?
Is is save just delete hzInstance?
I can do all this with the redux-saga but maybe @deontologician suggest a advice or good practice how to implement a reconnection better.
Right now, I think it's safe to create new queries with an existing Horizon connection after reconnecting. It's not going to handle old queries that are already running correctly (it may re-use request_ids which will be buggy). This can get kind of messy, so I wouldn't blame you if you just created a new Horizon instance on each disconnect.
If you're willing to polish this up a bit, I'm going to suggest something kinda radical. I haven't tested this so buyer beware, but this is an Observable scheme to automatically refresh your queries when you disconnect:
// Each value is a function taking a Horizon instance and returning an Observable
const rawAppQueries = {
userQuery: hz => hz('users').find(3).watch(),
birdQuery: hz => hz('birds').findAll({family: 'corvid'}).watch(),
// ... all other watch() queries in your app
// fetch() and store() etc are one-shot so they won't need to be refreshed.
}
}
// An observable that gives an infinite stream of new Horizon instances on each disconnect
const hzReconnector = Rx.Observable.create(subscriber => {
const reconnector = () => {
let hz = Horizon(hzOptions)
hz.onDisconnected(reconnector)
subscriber.next(hz)
}
reconnector() // kick off the chain
}).share() // share the Horizon instances between queries, important!
// Now, map each query into an observable that is resilient to reconnections.
const appQueries = {}
for (const query in rawAppQueries) {
// when we reconnect, switchMap will unsubscribe from the old stream, and subscribe to the new
// one, and everything will stay the same.
appQueries[query] = hzReconnector.switchMap(rawAppQueries[query])
}
// Now in your app you can just use:
appQueries.birdQuery.subscribe(birds => console.log('Birds:', birds))
So one limitation of this, which is fixable but makes the code above a bit more convoluted, is that when we reconnect the query, we'll re-emit the initial state again. Most virtual dom implementation (like React) won't re-render if the state is actually identical, but if you aren't using a virtual dom this may make a difference and you'll want to de-dupe your state events.
I would like to add that to achieve the desired effect with RxJS 5 and my current latest version of horizon (^2.0.0) I had to perform the following modifications:
// Each value is a function taking a Horizon instance and returning an Observable
const rawAppQueries = {
userQuery: hz => hz('users').find(3).watch(),
birdQuery: hz => hz('birds').findAll({family: 'corvid'}).watch(),
// ... all other watch() queries in your app
// fetch() and store() etc are one-shot so they won't need to be refreshed.
}
}
// An observable that gives an infinite stream of new Horizon instances on each disconnect
const hzReconnector = Rx.Observable.create(subscriber => {
const reconnector = () => {
let hz = Horizon(hzOptions)
hz.onDisconnected(reconnector)
hz.onReady(() => subscriber.next(hz));
setTimeout(hz.connect, 5000);
}
reconnector() // kick off the chain
}).share() // share the Horizon instances between queries, important!
// Now, map each query into an observable that is resilient to reconnections.
const appQueries = {}
for (const query in rawAppQueries) {
// when we reconnect, switchMap will unsubscribe from the old stream, and subscribe to the new
// one, and everything will stay the same.
appQueries[query] = hzReconnector.switchMap(rawAppQueries[query]).retry();
}
// Now in your app you can just use:
appQueries.birdQuery.subscribe(birds => console.log('Birds:', birds))
It will, as mentioned, re-emit the initial state, which isn't a problem in my case.