Status of Server-sent events
Hi,
First of all I'm really glad that this project is still very-well maintained and I want to thanks all the contributors!
With HTTP/2 server-sent events don't have the limitations that lead to the use of Websocket in the past. Did you talk about implementing SSE for NOTIFY natively in postgREST? Is this something that could end up here or should it be developed as an extension instead?
I like the advantages of SSE(it's HTTP, not full-duplex, etc) over Websockets. HTTP/2 sounds even better. Here's an overview I've come across.
SSE wasn't discussed until now. I haven't done much research about this, but I think streaming changes solutions can get complex(could evolve to logical decoding instead of LISTEN/NOTIFY, NOTIFY has payload limit, etc). Until now we've been recommending handling streaming changes through other systems(like RMQ).
So I think it'd be best to first develop it as an extension or fork. Then we could discuss including it on core.
Related project https://github.com/docteurklein/pg-sse
I think that if postgrest is to be extended with SSE, it should simply stream the results of a query, like Hasuras subscriptions, where they behind the scenes poll the database with the supplied query. Listen/notify should be consumed by other tools transforming the notifications into sse, ws, amqp or whatever suits your use case.
see also https://github.com/docteurklein/haskell-pg-sse/
@nerfpops I don't see how you can simply stream query results continuously in postgres. Timescaledb and pipelinedb extensions kind of offer the idea of continuous views, but that's hardly translatable to the subject of sse. What would you be polling, and how would you know what needs to be/has been sent to the subscriber? (I don't see it detailed in https://github.com/hasura/graphql-engine/blob/master/architecture/live-queries.md#implementing-graphql-live-queries)
I experimented a little with the listen/notify approach and ended up sending the sql that the subscriber should execute in the notification itself, allowing dynamic content without hitting payload size limits and reusing the same query result for multiple listeners of the same topic:
perform pg_notify(topic, json_build_object(
'sql', 'select row_to_json(message) from cardio.message where message_id = $1::uuid',
'params', array[message.message_id]
)::text) from unnest(message.topics || array['*', 'message', message.type]) as topic;
The subscriber side executes the statement and sends the result to the sse client: https://github.com/docteurklein/haskell-pg-sse/blob/master/Main.hs#L42
What about implementing Mercure?
@docteurklein I don't think listen/notify is a great fit here at all. Postgrest generates a sql query based on querystring (say /people?age=lt.13), and if sse were to be included in core, my preference is that the response from the postgrest generated sql query is streamed this way. When I say streamed, I mean that postgrest polls postgres with before mentioned query, then streams it via sse.
When I mention Hasura, it's because this is similar to the approach they take. Take a look at this part https://github.com/hasura/graphql-engine/blob/master/architecture/live-queries.md#user-content-when-do-we-refetch:
When do we refetch?
We experimented with several methods of capturing events from the underlying Postgres database to decide when to refetch queries.
- Listen/Notify: Requires instrumenting all tables with triggers, events consumed by consumer (the web-server) might be dropped in case of the consumer restarting or a network disruption.
- WAL: Reliable stream, but LR slots are expensive which makes horizontal scaling hard, and are often not available on managed database vendors. Heavy write loads can pollute the WAL and will need throttling at the application layer.
After these experiments, we’ve currently fallen back to interval based polling to refetch queries. So instead of refetching when there is an appropriate event, we refetch the query based on a time interval. There were two major reasons for doing this:
- Mapping database events to a live query for a particular client is possible to some extent when the declarative permissions and the conditions used in the live queries are trivial (like order_id = 1 and user_id = cookie.session_id) but becomes intractable for anything complicated (say the query uses 'status' ILIKE 'failed_%'). The declarative permissions can also sometimes span across tables. We made significant investment in investigating this approach coupled with basic incremental updating and have a few small projects in production that takes an approach similar to this talk.
- For any application unless the write throughput is very small, you’ll end up throttling/debouncing events over an interval anyway.
Regarding how to know what have changed, and what to pass on the the subscriber: I don't know, it could be interesting to see how Hasura solved that.
Is this on the roadmap at all?
++1
+1
Hello,
I like the advantages of SSE(it's HTTP, not full-duplex, etc) over Websockets. HTTP/2 sounds even better.
SSE wasn't discussed until now. I haven't done much research about this, but I think streaming changes solutions can get complex(could evolve to logical decoding instead of LISTEN/NOTIFY, NOTIFY has payload limit, etc).
Sorry may be a misunderstanding comment, but does supporting SSE have to involve LISTEN-NOTIFY?
It would suffice if an RPC processing a request could set a SQL session variable, like it works with setting response headers. The variable would be a dict of topic names to payload messages to be streamed into them once the current request commits its transaction. In configuration, an option would be needed similar to pre-request that would perform authorization checks on clients using Put/Delete/Get on a special /topics/../ URI. If no one is subscribed, the payload is lost.
I understand such a feature would be out-of-scope for PostgREST?
Currently i'm imagining it could be done by setting a custom response header, that other middleware would pick up and ping a specialized SSE hub for the job of routing the event to a connected client.