hitchhiker-tree
hitchhiker-tree copied to clipboard
Cleanup and factorize code base
The code base is currently not very accessible and badly factorized. We should go through all the comments and documentation of the codebase and collect it. We should also move all IO related code (including serialization) into separate namespaces and ideally break this out into separate repositories.
A proposal for this:
- [ ] move protocols in separate ns
- [ ] move main impl of hh-tree in new ns ("core" means nothing)
- [ ] move testing/stub related code in... test dir
- [ ] 1 repo, multiple deploy targets? move all the serialization/io related code into separate clj projects
I was thinking about something like that:
../hitchhiker-tree/src/hitchhiker/tree/protocols.clj
../hitchhiker-tree/src/hitchhiker/tree/impl/messaging.clj
../hitchhiker-tree/src/hitchhiker/tree/impl/tree.clj
../hitchhiker-tree/src/hitchhiker/tree.clj* ;; default api surface using using proto+impl/*, so a single requires brings both read/write interfaces
Then we can imagine having subprojects (with their own project.clj), potentially in the same repo but with different deploy targets (artifacts), a bit like I did here: https://github.com/mpenet/alia/tree/master/modules (or https://github.com/arrdem/katamari or even ring https://github.com/ring-clojure/ring): basically it adds up to the same namespace hierarchy with new functionality but via multiple dependencies.
../codec/src/...
../redis/src/hitchhiker/tree/redis.clj /redis/src/hitchhiker/tree/redis/*.clj...
../konserve/src/hitchhiker/tree/konserve.clj /konserve/src/hitchhiker/tree/konserve/*.clj
few other things:
- the more I deal with that codebase the more I wonder if it's worth keeping the sync version around, I am curious to hear about the arguments in favor.
- exceptions handling, if we get rid of the sync interface we can also simplify error handling: datomic async interface returns maps with https://github.com/cognitect-labs/anomalies tagged maps for errors values in async context, which is interesting. I personally like the idea of using a hmap with a :type key with a keyword as value, bound to a custom hierarchy potentially , + other data to describe errors, then just return it in async context or (throw (ex-info "Something bad" {:type ::something-bad :more "info"})) in sync. That allows to build a cross platform (custom) hierarchy of errors if needed a bit like what I use here: https://github.com/mpenet/ex that can be used/queried in async/sync easily.
I like the refactoring proposal. I would 1. collect all the comments that are documentation (e.g. in core.cljc) and move them to some document, e.g. wiki or into the doc folder. 2. Do your proposed refactoring for protocols and 3. factor out the IO backends. I like the idea of keeping them in the same repo for now, but I have not a very strong opinion on that.
-
We can move to the async version completely, in case of core.async we only have to be careful not to block threads on IO in our backends. I remember it was a bit slower than the synchronous version on the JVM though. I am not sure whether your opinion is still the same after the Telegram discussion. In general async IO should allow to scale to more parallel clients running queries against the DB, but this is speculation.
-
Regarding error handling I have tried to do the :type + namespaced keyword approach in most of replikativ with ex-info. I like it because it is still using the default exception mechanism, but we can also do something else. I am not really familiar with the anomalies yet.