vivace-graph-v3 icon indicating copy to clipboard operation
vivace-graph-v3 copied to clipboard

ready for production?

Open ychakiris opened this issue 5 years ago • 16 comments

This looks very interesting. How close is it to "production ready?"

ychakiris avatar Feb 03 '19 02:02 ychakiris

Yes, I have been using it in production environments for quite a few years. Are there specific features that would make it production-ready in your mind?

kraison avatar Feb 03 '19 02:02 kraison

Mainly reliability and performace. I realize that these terms don't have one-dim definitions. So I mean them in a more simple minded way. "reliability" = not losing data and availabillity. "performace" = fast enough to be usable and one can find things that were put in there.

I am research scientist that works at a elementary school. We are trying to optimize the learning enviroments for children and we use quite a bit of home grown tech to do it. For example cameras in all the classrooms, lots of whatsapp texting and some other tools like google apps. I like to work in common lisp and it would be nice to have a graph type db to keep the texting in. We have quite a bit of stuff in whatsapp.

ychakiris avatar Feb 03 '19 02:02 ychakiris

Perhaps you can tell me more about how you have used in production?

ychakiris avatar Feb 03 '19 02:02 ychakiris

So, VG is ACID compliant and pretty darn fast. Zach and I built the transaction system around an optimistic currency control model. Data is stored in memory mapped files and in a transaction log. There is also a primary/secondary replication scheme built in. You can also replay transaction logs to create a new instance as you will. Snapshotting is also available. As far as getting data out, you can use the available Lisp methods or Prolog; see example.lisp.

I have used VG as an online catalog for millions of products, as the back end for a complex, adaptable VoIP-based IVR, as well as data store for several complex big data analysis systems, and finally as the engine for two recommender systems.

The main bottleneck in VG is data serialization and deserialization; the system makes heavy use of caching to overcome this. memory maps make going to and from disk quite fast, but the Lisp data structures must be pickled in order to be written to disk. I have investigated a feature of some older Lisps that once had user-definable memory areas; these memory areas were extensible in much the way CLOS is. Using such a technology could allow for writing Lisp data structures unadulterated to disk, which would eliminate the need for serialization; however, it is a big task and I have not had time nor funding to make it happen. That said, VG has been fast enough for my purposes. Please run some benchmarks if you like; I would be happy to hear about your experiences.

The project is also looking for contributors, as it lacks sufficient docs and has a few warts (see the other issues here on github) that need addressing.

kraison avatar Feb 05 '19 09:02 kraison

Hi @kraison do you have any documentation and 'quick start' guide? I was looking for CL libs for RDF/OWL and recently made some small improvements in Wilbur (https://github.com/arademaker/incf-wilbur). I didn't know that VG has so mature, maybe I could try to play with that and contribute.

arademaker avatar Feb 05 '19 10:02 arademaker

Please see the wiki for a very basic tutorial.

kraison avatar Feb 05 '19 11:02 kraison

It looks like people are actually interested in the project, so I will make an effort to provide more documentation when I am home from traveling next week.

kraison avatar Feb 05 '19 12:02 kraison

@kraison Thanks for the information!!

Definitely intrigues me enough to load some data into it and experiment to see whether it fits my usecase. Also see how easy it is to use with no documention (other than the source code).

ychakiris avatar Feb 05 '19 18:02 ychakiris

The main bottleneck in VG is data serialization and deserialization; the system makes heavy use of caching to overcome this. memory maps make going to and from disk quite fast, but the Lisp data structures must be pickled in order to be written to disk. I have investigated a feature of some older Lisps that once had user-definable memory areas; these memory areas were extensible in much the way CLOS is. Using such a technology could allow for writing Lisp data structures unadulterated to disk, which would eliminate the need for serialization; however, it is a big task and I have not had time nor funding to make it happen. That said, VG has been fast enough for my purposes. Please run some benchmarks if you like; I would be happy to hear about your experiences.

For user defined memory areas you might want to take a look at two common lisp projects: cl-mpi and static-vectors. I saw the cl-mpi project on a youtube video on high performace computing and he mentioned that for MPI to work properly (via cffi) one needs large memory regions that don't move. It seems he uses static-vectors for that purpose

Not sure this is fully relevant but might be worth a look. The video is interesting in its own right.

ychakiris avatar Feb 05 '19 18:02 ychakiris

@ychakiris do check out the GitHub wiki for the project for some usage examples.

kraison avatar Feb 06 '19 06:02 kraison

very good!! I will work though it.

I will be modelling parts of what I will call the "behavioral ecology" of a Montessori (hybrid) elementary school. Lets say at the lowest level of modeling there are what I will call "actors" and "events." Actors can be both human and non-human (a la Bruno Latour) and events are simply changes in the configuration of actors.

Some examples:

  1. A child is sitting at a table doing some work on a worksheet in a classroom. The actors would be the child, the table, chairs, worksheet, the other parts of the classroom, etc. Events would occur for each change in this configuration (e.g. doing a problem, or a friend stopping by to talk, etc)
  2. At "circle time" the teacher and students are all sitting around the circumference of a large rug listening to a lesson using Montessori materials. Actors include the children, teacher, rug, etc.
  3. There are four cameras in the classroom continuously recording. This system is also made of up of actors and events.
  4. Teachers and staff members have smart phones and are constantly using whatsapp to record comments and discussion about the classroom ecology. Actors here are all the messages, phones, teachers, staff members, etc.

Looking at actors, events, and classifying them according to the ecology via their behavior analytics (behavioral history, reinforcing events, etc) is the most natural way to model this and store it in a database of some sort. Events that represent the interaction of actors are the most interesting part of things.

Since each actor has a behavioral history (all the events that occurred to them) clearly this is immutable data. Once an event occurs, it will never be changed. However the interpretations of that event can certainly change.

Seems to me there will be a lot of immutable data in this.

ychakiris avatar Feb 06 '19 16:02 ychakiris

Some questions:

  1. Lets say we have 50K text messages or varying length with an average of about 1K. How much space would you estimate that will be in your database?
  2. Is there a way of using the Fset library with your graph db to handle immutable data?

ychakiris avatar Feb 06 '19 16:02 ychakiris

Jumping in, I'd say that I'd be very interested in this, if there were better documentation: I did some experiments with it a while ago, but couldn't figure out a reasonable way to handle things like unique constraints.

fiddlerwoaroof avatar Feb 06 '19 21:02 fiddlerwoaroof

Why is vivace graph so fast? I have been comparing it with SQL-based approach and Neo4j, and vivace graph is much, much faster.

joshcho avatar Feb 04 '23 07:02 joshcho

Through a combination of linear hash tables, skip lists for indexing and MCAS for updates. I'm in the Donbas right now, so apologies for not having time to explain in more depth.

kraison avatar Feb 09 '23 09:02 kraison