convex icon indicating copy to clipboard operation
convex copied to clipboard

Secondary root hash in Etch for embedded peers

Open helins opened this issue 4 years ago • 12 comments

I was wondering how sensible it would be adding an extra secondary root. This would allow an application embedding a peer to share a same Etch instance with the peer itself. I recall @pedrorgirardi expressed the desire of using Etch instead of Datalevin for convex.world, this would make that kind of use cases more efficient.

helins avatar Oct 27 '21 20:10 helins

My thought was that you can effectively put any structure you like in the root hash slot: it could be a map with unique fields for each application, for example. As long as the application and peer implementation agree on usage of the root hash, it should work fine.

mikera avatar Oct 27 '21 21:10 mikera

Yes but for instance the current peer implementation writes directly to the root and there is nothing you can do about it. Either this bit should be more flexible (providing a mapping function?) or there exist the possibility of having more than 1 root so that parties which do not need to coordinate effectively don't (peer don't care about application state and vice-versa).

helins avatar Nov 04 '21 18:11 helins

Well you could have a map of {app-id1 app-data1 ....} stored as the root. It's really up to the Peer implementation to decide what format is used, this is outside the protocol spec.

We could make this pluggable I guess, have a default implementation with the current behaviour but you can override it if you like?

mikera avatar Nov 08 '21 01:11 mikera

Indeed peers can take many liberties but the default implementation should be as flexible as possible. Now one will currently or in the envisioned future build a peer from scratch :)

helins avatar Nov 08 '21 22:11 helins

OK well I'm pretty happy with the proposal to let people customise the root hash persistence if they wish. Also remember there's nothing to stop someone storing their own root hash elsewhere, they don't need to use the built-in root hash facility.

mikera avatar Nov 08 '21 23:11 mikera

It would be helpful prioritizing this for 2 reasons:

  • convex-web uses Datalevin while it looks like sharing an Etch instance with the peer server would be enough. Datalevin uses illegal reflection so we cannot upgrade to Java 17 && it uses some native bindings which forces me to emulate a JVM in x86 making dev quite slow.

  • In the future, Convex Shell will need to share an Etch instance in case of running a peer server.


The ideal solution would allow specifying a number of roots when creating an instance (defaulting to 1). When creating a peer server, one could specify a "root index" to use (defaulting to 0).

Given a discussion we had some time ago, if I'm not mistaken, this would allow any concurrent use of Etch (in-process) without having to synchronize anything at the root level (each consumer writes to its own root).

helins avatar Oct 16 '22 09:10 helins

Would it work if the root was a map, and you could pick an arbitrary key? (presumably peers would use their unique public key)

mikera avatar Oct 20 '22 10:10 mikera

Yes, totally fine from a user perspective. The intent is for different in-process use cases sharing a store to have each their own "place" for remembering their own state.

However a map would mean there must be a locking mechanism put in place? And a tiny perf hit for having to deal with a map.

helins avatar Oct 20 '22 10:10 helins

I'm working on the assumption that you probably shouldn't set the root too often, so locking and map performance hit is probably OK?

mikera avatar Oct 20 '22 10:10 mikera

Is it because it would mess the structure of Etch too much having more than 1 root?

Currently, peers and the Shell probably don't need to write too often. However, it would be more future-proof not assuming that. Especially if we diversify to Orbit-like, torrent-like ideas and whatnot.

helins avatar Oct 20 '22 10:10 helins

I'm not concerned so much about the structure of Etch but having a sane API. Some considerations:

  • If you have numbered slots, how do you allocate these? If you open an arbitrary Etch database, you wouldn't have much of a clue what they are.
  • How might you reconfigure, add new peers to a running instance etc?

A fixed number of slots looks like it could cause trouble, so a map is probably a better fit?

mikera avatar Oct 20 '22 10:10 mikera

Adding/removing "slots" over time is a solid concern. You have convinced me 👍

helins avatar Oct 20 '22 10:10 helins