sirix Support for realtime web

Meaning in a first step we should stream the data changes for a resource to interested clients (we could just stream the JSON we generate, which stores a change track).

In a second step we can support queries, which only query one resource. Thus, naively we'd probably have to execute the query/queries asynchronous once a trx has been flushed to disk and check that only result nodes in the curre transaction intent log are streamed to interested clients. Maybe we can store the compiled query at least up until indexes are added/dropped.

Better ideas are of course welcome. A simpler much more efficient approach would be to check for updates on certain paths in the resource.

Oct 13 '22 22:10 JohannesLichtenberger

How does RethinkDB implement this?

Oct 13 '22 22:10 JohannesLichtenberger

I wonder if looking at https://github.com/pubkey/event-reduce would be helpful.

Oct 13 '22 22:10 mosheduminer

Thanks Moshe, do yohh understand the basic algorithm they are using? Maybe it's too late currently, but I don't get how they determine if a new record from a change event is a query result or not.

Oct 13 '22 22:10 JohannesLichtenberger

I don't know, I haven't really looked at the implementation. So I only know what it says in the readme.

Oct 13 '22 23:10 mosheduminer

Hello! I am interested in working on this issue. Would it be ok? I was thinking of using webSockets to stream the json. Also could you specify in which path I should focus first?

Jun 02 '24 08:06 ElenaSkep

You can start by streaming updates to subscribed clients: https://github.com/sirixdb/sirix/blob/982a346f6dbceeade7cf581199c47c910804f14d/bundles/sirix-core/src/main/java/io/sirix/access/trx/node/AbstractNodeTrxImpl.java#L297

Maybe we can add a post commit hook to listen for changes.

Jun 02 '24 19:06 JohannesLichtenberger

So, I think it would be great if clients can subscribe to a database/resource update stream, thus that they receive what has been changed as a JSON change stream (we already write these changes to JSON files on disk).

So we need to have a simple pub/sub mechanism, checking for read-access rights and to write the changes to the websocket.

Jun 02 '24 19:06 JohannesLichtenberger

Hello again! @FayKounara and I have made some progress. We have created a pub/sub mechanism using Apache Kafka but it would be helpful if you could specify who is considered an interested client. Also we are using WebSockets and Nginx if its ok.

Jun 09 '24 13:06 ElenaSkep

@ElenaSkep oh wow, I think for our use case we should however keep it simple and use a non distributed pub/sub mechanism (maybe there' already a solution using Vert.x). Interested clients would for instance be a browser (IMHO it would be great to have a web based GUI which has views to either query or show the diffs between revisions). However, in the general case a Kafka based backend would be nice (but I think I'd implement a new storage solution in the io-package for that (analogous to the FileChannel or MMStorage)).

Jun 09 '24 13:06 JohannesLichtenberger

So, I think a web client (for instance the TypeScript based client) would subscribe via a WebSocket to a database / resource in the database and subsequently it would receive all changes in the current JSON format. It may be used in the future by a new web frontend to display the differences.

Jun 09 '24 13:06 JohannesLichtenberger

You'd probably use a simple thread safe blocking queue to handle the subscribers...

Jun 09 '24 13:06 JohannesLichtenberger

So, in general it should be part of the sirix-rest-api bundle.

Jun 09 '24 14:06 JohannesLichtenberger

Hello! So now whenever there is a change in a database/resource we keep it in a topic (provided by apache kafka). What if we try to push this topic in a websocket? Is this something that would be valuable?

Jun 09 '24 14:06 FayKounara

To be fair I think it's too much overhead. I think it would be more valuable to provide another storage type for Kafka, to store the pages in Kafka instead of or asynchronous to storing in a local file in Kafka.

Jun 09 '24 17:06 JohannesLichtenberger

Sorry we got a little confused. Currently we have changed the serializeUpdateDiffs method and when there is a change in a db it saves it to a topic. So when someone runs Sirix the user will see the result of the query but also all the changes that have happened in this specific resource/database. Should we proceed and do something for this or look at something else?

Jun 09 '24 18:06 ElenaSkep

So, do you use the JSON format? I think instead of using Kafka it would be nice to have a new route in sirix-rest-api, where you can subscribe and receive changes via a WebSocket. I'd rather envision, that Kafka could store the whole resource as an alternative storage backend, what do you think? Sorry for the confusion, but I'm not sure if a Kafka change stream would probably also make sense.

In any case it would be nice to have both, a Vert.x based solution and maybe also the Kafka based.

Jun 09 '24 21:06 JohannesLichtenberger

Yes we use the json format. Okay we will look into what you suggested. Thank you for clarifying it!

Jun 09 '24 21:06 FayKounara

Ok so we will make a pull request for what you asked with the Websockets directly listening for changes in the db. Since we have created an implementation with Apache Kafka as well should we open another issue for this enhancement?

Jun 11 '24 12:06 ElenaSkep

Let's see, you can of course make two PRs but we should never create a dependency on Kafka, as Sirix can also simply used as an embedded library in other JVM language based projects.

Jun 11 '24 18:06 JohannesLichtenberger

And BTW: Thanks for all the work. Hope you'll also contribute in the future...

Jun 11 '24 18:06 JohannesLichtenberger

sirix sirix copied to clipboard

Support for realtime web

sirix
sirix copied to clipboard