convex
convex copied to clipboard
Secondary root hash in Etch for embedded peers
I was wondering how sensible it would be adding an extra secondary root. This would allow an application embedding a peer to share a same Etch instance with the peer itself. I recall @pedrorgirardi expressed the desire of using Etch instead of Datalevin for convex.world, this would make that kind of use cases more efficient.
My thought was that you can effectively put any structure you like in the root hash slot: it could be a map with unique fields for each application, for example. As long as the application and peer implementation agree on usage of the root hash, it should work fine.
Yes but for instance the current peer implementation writes directly to the root and there is nothing you can do about it. Either this bit should be more flexible (providing a mapping function?) or there exist the possibility of having more than 1 root so that parties which do not need to coordinate effectively don't (peer don't care about application state and vice-versa).
Well you could have a map of {app-id1 app-data1 ....} stored as the root. It's really up to the Peer implementation to decide what format is used, this is outside the protocol spec.
We could make this pluggable I guess, have a default implementation with the current behaviour but you can override it if you like?
Indeed peers can take many liberties but the default implementation should be as flexible as possible. Now one will currently or in the envisioned future build a peer from scratch :)
OK well I'm pretty happy with the proposal to let people customise the root hash persistence if they wish. Also remember there's nothing to stop someone storing their own root hash elsewhere, they don't need to use the built-in root hash facility.
It would be helpful prioritizing this for 2 reasons:
-
convex-webuses Datalevin while it looks like sharing an Etch instance with the peer server would be enough. Datalevin uses illegal reflection so we cannot upgrade to Java 17 && it uses some native bindings which forces me to emulate a JVM in x86 making dev quite slow. -
In the future, Convex Shell will need to share an Etch instance in case of running a peer server.
The ideal solution would allow specifying a number of roots when creating an instance (defaulting to 1). When creating a peer server, one could specify a "root index" to use (defaulting to 0).
Given a discussion we had some time ago, if I'm not mistaken, this would allow any concurrent use of Etch (in-process) without having to synchronize anything at the root level (each consumer writes to its own root).
Would it work if the root was a map, and you could pick an arbitrary key? (presumably peers would use their unique public key)
Yes, totally fine from a user perspective. The intent is for different in-process use cases sharing a store to have each their own "place" for remembering their own state.
However a map would mean there must be a locking mechanism put in place? And a tiny perf hit for having to deal with a map.
I'm working on the assumption that you probably shouldn't set the root too often, so locking and map performance hit is probably OK?
Is it because it would mess the structure of Etch too much having more than 1 root?
Currently, peers and the Shell probably don't need to write too often. However, it would be more future-proof not assuming that. Especially if we diversify to Orbit-like, torrent-like ideas and whatnot.
I'm not concerned so much about the structure of Etch but having a sane API. Some considerations:
- If you have numbered slots, how do you allocate these? If you open an arbitrary Etch database, you wouldn't have much of a clue what they are.
- How might you reconfigure, add new peers to a running instance etc?
A fixed number of slots looks like it could cause trouble, so a map is probably a better fit?
Adding/removing "slots" over time is a solid concern. You have convinced me 👍