datalevin
datalevin copied to clipboard
Datalevin for Nodejs/javascript
Datalevin is an amazing project. I want to use datalog with Nodejs. Is it possible to compile Datalevin to Javascript with Clojurescript?
Not right now. However, it should be possible to do.
I will be abstracting out common LMDB protocols so different platforms can implement in their own LMDB bindings. Right now, I am trying to get Datalevin to work with GraalVM native image, hopefully i can get it to work. I am writing a new LMDB Java wrapper in order to do this, since LMDBJava is incompatible with native image. Once this is done, I will work on the protocols. The goal is to get a native Datalevin command line shell.
Once the protocals are in place, we can build a node.js version of Datalevin fairly easily, since the high level code in Datalevin are Clojurescript compabible. The remaining area of work is to abstract out bit encoding, as Javascript have some differences from Java in term of data types, but it should not be too much work.
Anyway, I can use some help if people are interested in adopting Datalevin to other platforms.
Another possibility that saves the effort of maintaining too many LMDB bindings in Datalevin, is to compile the native image of Datalevin as a shared C library, so that other languages or environment can link to it, be it node.js, python, or whatever, as long as that environment can handle EDN data.
If my effort of compiling Datalevin to native is successful, I would prefer this approach.
Now native Datalevin is finished. We also defined a set of LMDB protocols that have implementations in both Java and GraalVM. The same can be done for node.js as well. This would be a good project for someone familiar with node.js to step in to implement.
thanks for making Datalevin,
I want to help with node.js binding, but I also have a different scenario in mind, such as use another database binding for react-native.
from my understanding of protocol in imdb.cljc basically, any key-value database with a range-scan feature when having the right binding should work with Datalevin.
since this can be any key-value database implementation I purpose to add optionals parameters (can be {:open-kv :new-db-driver-other-than-graal-and-java}) in open function not only :graal and :java in lmdb/open-kv and relate function that need to pass this param up to surface API (d/get-conn)
the benefit is getting some room for me and other people to try to implement "lmdb binding" (which seems more like a key-value database driver) for node.js and other clients since a basic driver can be easily made and hard to tune up to production-grade.
The protocols in the lmdb namespace are specifically tailored for LMDB. For example, DBI, renewable read only transaction, and so on, are LMDB specific features that do not exist in other key value stores. Of course, one may be able to shoehorn another key value store in there, but I am not sure it is worth the effort, for it would be simpler to get LMDB to work on whatever platform you are interested in. As far as I can tell, LMDB runs on everything. Its source code is a single C file, after all.
I personally find pluggable storage a not so good idea for a database system, where performance is of the highest importance. Each storage has its own idiosyncrasy, a pluggable storage system will have to tailor to the lowest common denominator, that means the performance is going to be mediocre at best. If I am interested in that, I wouldn't start Datalevin to begin with. There are already plenty of options for that. I have used databases with pluggable storage. We switched different storages, and they all have their own problems. I would rather deal with the problems of one storage.
open-kv is a multi-method, so if someone implemented another LMDB binding, see, node.js, we will just add a :node target in there. I don't see a need to change the function signature.
thanks for your explanation. I respect your design decision. Keep up the great work.
On Sun, Mar 7, 2021 at 1:15 PM Huahai Yang [email protected] wrote:
The protocols in the lmdb namespace are specifically tailored for LMDB. For example, DBI, renewable read only transaction, and so on, are LMDB specific features that do not exist in other key value stores. Of course, one may be able to shoehorn another key value store in there, but I am not sure it is worth the effort, for it would be simpler to get LMDB to work on that platform. As far as I can tell, LMDB runs on everything. Its source code is a single C file, after all.
To be honest, if someone insists on using another key value store, my suggestion is to look elsewhere, e.g. Datahike. They seem to offer pluggable storage. I personally find pluggable storage a silly idea for a database system, where performance is of the highest importance. Each storage has its own idiosyncrasy, a pluggable storage system will have to tailor to the lowest common denominator, that means the performance is going to be mediocre at best. If I can live with that, I wouldn't start Datalevin to begin with. There are already plenty of options for that.
open-kv is a multi-method, so if someone implemented another binding, see, node.js, we will just add a :node target in there. I don't see a need to change the function signature.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/juji-io/datalevin/issues/33#issuecomment-792224676, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHH6MQ5PL5UXUKGMRLL5W3TCMKZXANCNFSM4WE7O7KQ .
If it helps at all, I work on/maintain the NodeJS/LMDB binding package (https://github.com/DoctorEvidence/lmdb-js). It has really good performance characteristics, and would be glad to help with making it work for a NodeJS version Datalevin, if that is of interest.
That sounds promising. I certainly welcome such an addition, and can lend a helping hand whenever necessary.
I can confirm that lmdb-js has an astonishing performance and the interaction with cljs is seamless. I even thought about porting dtlv to nodejs, but the task is beyond my available time:(
So it seams there are some options for JS ecosystem. How about Python. I would like to use datalevin to store intermediate processing results. The process involves Clojure steps and python steps. They use RabbitMQ now but that is not optimal. I would like to run the steps as batch processes:
- Run clojure step, save to datalevin
- Run python script - read from datalevin, store in datalevin
- Run Clojure process to finish the job.
Some things I looked at:
- Not sure if this is useful in the future - https://substrait.io/
- Looking at implementing Apache Calcite SQL (Datalog) engine on top of Datalevin and using Avatica JDBC for connectivity. But there is no good driver for avatica python either. https://calcite.apache.org/avatica/
Probably the c library approach would be simplest in this case.
Since no one steps up, we will go C library route. Provide datalevin as a native C shared library that speaks JSON. This is a route that creates minimal work for me, yet caters to the widest range of users, since every environment can work with a C library that speaks JSON: node, python, or whatever.