sirix
sirix copied to clipboard
Support for realtime web
Meaning in a first step we should stream the data changes for a resource to interested clients (we could just stream the JSON we generate, which stores a change track).
In a second step we can support queries, which only query one resource. Thus, naively we'd probably have to execute the query/queries asynchronous once a trx has been flushed to disk and check that only result nodes in the curre transaction intent log are streamed to interested clients. Maybe we can store the compiled query at least up until indexes are added/dropped.
Better ideas are of course welcome. A simpler much more efficient approach would be to check for updates on certain paths in the resource.
How does RethinkDB implement this?
I wonder if looking at https://github.com/pubkey/event-reduce would be helpful.
Thanks Moshe, do yohh understand the basic algorithm they are using? Maybe it's too late currently, but I don't get how they determine if a new record from a change event is a query result or not.
I don't know, I haven't really looked at the implementation. So I only know what it says in the readme.
Hello! I am interested in working on this issue. Would it be ok? I was thinking of using webSockets to stream the json. Also could you specify in which path I should focus first?
You can start by streaming updates to subscribed clients: https://github.com/sirixdb/sirix/blob/982a346f6dbceeade7cf581199c47c910804f14d/bundles/sirix-core/src/main/java/io/sirix/access/trx/node/AbstractNodeTrxImpl.java#L297
Maybe we can add a post commit hook to listen for changes.
So, I think it would be great if clients can subscribe to a database/resource update stream, thus that they receive what has been changed as a JSON change stream (we already write these changes to JSON files on disk).
So we need to have a simple pub/sub mechanism, checking for read-access rights and to write the changes to the websocket.
Hello again! @FayKounara and I have made some progress. We have created a pub/sub mechanism using Apache Kafka but it would be helpful if you could specify who is considered an interested client. Also we are using WebSockets and Nginx if its ok.
@ElenaSkep oh wow, I think for our use case we should however keep it simple and use a non distributed pub/sub mechanism (maybe there' already a solution using Vert.x). Interested clients would for instance be a browser (IMHO it would be great to have a web based GUI which has views to either query or show the diffs between revisions). However, in the general case a Kafka based backend would be nice (but I think I'd implement a new storage solution in the io-package for that (analogous to the FileChannel or MMStorage)).
So, I think a web client (for instance the TypeScript based client) would subscribe via a WebSocket to a database / resource in the database and subsequently it would receive all changes in the current JSON format. It may be used in the future by a new web frontend to display the differences.
You'd probably use a simple thread safe blocking queue to handle the subscribers...
So, in general it should be part of the sirix-rest-api
bundle.
Hello! So now whenever there is a change in a database/resource we keep it in a topic (provided by apache kafka). What if we try to push this topic in a websocket? Is this something that would be valuable?
To be fair I think it's too much overhead. I think it would be more valuable to provide another storage type for Kafka, to store the pages in Kafka instead of or asynchronous to storing in a local file in Kafka.
Sorry we got a little confused. Currently we have changed the serializeUpdateDiffs method and when there is a change in a db it saves it to a topic. So when someone runs Sirix the user will see the result of the query but also all the changes that have happened in this specific resource/database. Should we proceed and do something for this or look at something else?
So, do you use the JSON format? I think instead of using Kafka it would be nice to have a new route in sirix-rest-api
, where you can subscribe and receive changes via a WebSocket. I'd rather envision, that Kafka could store the whole resource as an alternative storage backend, what do you think? Sorry for the confusion, but I'm not sure if a Kafka change stream would probably also make sense.
In any case it would be nice to have both, a Vert.x based solution and maybe also the Kafka based.
Yes we use the json format. Okay we will look into what you suggested. Thank you for clarifying it!
Ok so we will make a pull request for what you asked with the Websockets directly listening for changes in the db. Since we have created an implementation with Apache Kafka as well should we open another issue for this enhancement?
Let's see, you can of course make two PRs but we should never create a dependency on Kafka, as Sirix can also simply used as an embedded library in other JVM language based projects.
And BTW: Thanks for all the work. Hope you'll also contribute in the future...