datahike
datahike copied to clipboard
Add specs for the `datahike.api` namespace
This PR aims to add type signatures (as Clojure Specs) for the whole datahike.api
namespace. ~It's marked as WIP because we're not there yet.~
A few notes and questions:
- I spent some time evaluating malli as well, but went with specs in the end due to the wider ecosystem support: we in fact use Orchestra (which is notably integrated in Kaocha) for instrumentation and pretty errors, and spec-tools to be able to declare record specs in a less awkward way (and closer to literature) than with vanilla specs (I can ditch
spec-tools
though if you'd not like to pull that in, as the existing specs don't use it) - There are some failing tests due to the tests passing varargs as configuration, which I understand it's deprecated at this point. This leads to issues like
:initial-tx
being accepted as a key in theConfig
map and as a vararg in certain functions, which as a user might be surprising and raise questions around precedence. So the main question before I can move forward is: should we spec these varargs? Doing so might be fairly time consuming, and not doing so will require changing the tests to not use them. We wouldn't require a breaking change removing them right away, but these code paths will not be exercised anymore.
A few more notes:
- I noticed there are already some specs already for schema creation and config, and I intend to reuse them, but I feel we should still consider the varargs question now, as they are deprecated since
0.3.0
and this is also about "incentivizing usage", as specs also act as documentation - adding the rest of the specs should be fairly mechanical once we evaluate this first draft, as the main hurdle at this point was setting up instrumentation for type signatures
- how would you feel about using
defn-spec
?
would it be feasible to add an ADR for this? I would love to read about why and how we solve this problem and later probably again some time. Update: Probably not...was not reading properly. This is maybe not a change that needs an ADR:)
Should this be a draft PR for now?
@TimoKramer Is this the form of spec'ing you would also have expected for the client?
I thought about that. Probably it is possible to generate openapi-spec from that. For datahike-server I want to have a openapi-spec for the rest-api to generate routes and validation automatically and auto-generate clients based on that as well.
Should this be a draft PR for now?
@hellerve I forgot to make it a draft at first and now it looks like I cannot convert it to draft anymore, so I just prepended WIP to the title for now
Probably it is possible to generate openapi-spec from that. For datahike-server I want to have a openapi-spec for the rest-api to generate routes and validation automatically and auto-generate clients based on that as well.
@TimoKramer yep, spec-tools supports generating an OpenAPI document from specs, so once all the API surface is specced then generating clients from there is easy
Is this the form of spec'ing you would also have expected for the client?
Right, I wanted to add a note about this: the specs I'm adding are Clojure specific, e.g. there's the fact that we return an atom
from connect
- if we are to use this layer as a "common API" for clients then this is not good: obviously the objects returned from a REST API should be "simple data" (since everything has to be encoded to JSON or similar format). I am not sure how a client should look like, so I don't have an answer for this yet
@f-f I added a dedicated Connection
type in my PR last night here https://github.com/replikativ/datahike/pull/332/files#diff-24742157a2cef9db99aa63817dda341a1d110787835b4a18ac27aa95c9bcc529R25. It basically gives us control over the deref
erencing operation and can be updated by transact through swap!
. There are still some other operations in the codebase operating on connections, such as resetting them, but can probably remove those. A third functionality that Clojure's Atom
s provide is metadata that we use for listeners on changes (callbacks).
@f-f I thought a little bit about the varargs and moving forward I'm not in favor for them and I'm refactoring the tests anyway, so you can go ahead and not spec them.
Regarding defn-spec
I think it looks good and declutters a file with many definitions. On the other hand, people not familiar with it would be confused looking at the functions. Do you know a project that uses it?
@kordano great news! Do you think I should wait for the test refactoring before moving forward with this? It might be actually necessary if we don't want to spec these varargs now, since I'm instrumenting the specs at test time and tests will just fail if they don't conform to the "type signatures".
In the meantime I stumbled on the very dynamic q
, that seems to be hard to spec cleanly since it accepts so many things. I'm thinking how to deal with that, but it might be fine to have a more "superficial" speccing for that right now, and then do a deeper refactoring later (that would also reuse these specs to conform the arguments, reduce duplication, and generally aid implementation)
Re defn-spec
: I don't know of any projects using it, and while it looks like a neat idea, I'm leaning more towards just using the vanilla s/fdef
, to save on the novelty budget.
This is now ready for review! I feel like I'm fairly happy with where I got here, and tests are now passing locally. (I suspect CI should run, but I also suspect someone authorized to merge should press the button to run it since it's my first contribution here)
Final report:
- the overview of the patch is that now all documented functions in the
datahike.api
namespace also ship with a Clojure Spec attached to them, defined throughclojure.spec.alpha/fdef
. These specs are then checked at test-running-time and in development, and not checked at all in production code - the specs are accurate only to the extent of where we exercise the function calls in tests - e.g. some functions might return
nil
, but since they never do in tests then this behaviour is not specced. So expect these specs to be imperfect and need tweaks, at least in the beginning (fortunately error messages are very nice and point exactly to where issues are) - however, the function arguments and return values are specced to a degree that we deemed satisfactory when discussing it with @whilo and @kordano, but there are some places where we could spec things more in detail - such as the database schema, for which we already have a spec. We don't do this here since this only intends to be a first pass. I left some
TODO
s in some places where I think there could be improvements in the future - contrary to a previous suggestion in the thread the deprecated varargs are specced, but there's some duplication between the specs we use in the
fdefs
and the existing ones for things like the database config. They should be unified, but since they serve different purposes - the existing one is to conform the arguments to the function call to a predictable data structure - I'd prefer to defer this work to a later moment where we depend a bit less on the varargs. - we add a new
datahike.spec
namespace here, which I suspect will be documented bycljdoc
- do we want to tweak the docs in any way?
This work will be updated and integrated with PR #596