datahike icon indicating copy to clipboard operation
datahike copied to clipboard

Add specs for the `datahike.api` namespace

Open f-f opened this issue 3 years ago • 11 comments

This PR aims to add type signatures (as Clojure Specs) for the whole datahike.api namespace. ~It's marked as WIP because we're not there yet.~

A few notes and questions:

  • I spent some time evaluating malli as well, but went with specs in the end due to the wider ecosystem support: we in fact use Orchestra (which is notably integrated in Kaocha) for instrumentation and pretty errors, and spec-tools to be able to declare record specs in a less awkward way (and closer to literature) than with vanilla specs (I can ditch spec-tools though if you'd not like to pull that in, as the existing specs don't use it)
  • There are some failing tests due to the tests passing varargs as configuration, which I understand it's deprecated at this point. This leads to issues like :initial-tx being accepted as a key in the Config map and as a vararg in certain functions, which as a user might be surprising and raise questions around precedence. So the main question before I can move forward is: should we spec these varargs? Doing so might be fairly time consuming, and not doing so will require changing the tests to not use them. We wouldn't require a breaking change removing them right away, but these code paths will not be exercised anymore.

f-f avatar May 17 '21 18:05 f-f

A few more notes:

  • I noticed there are already some specs already for schema creation and config, and I intend to reuse them, but I feel we should still consider the varargs question now, as they are deprecated since 0.3.0 and this is also about "incentivizing usage", as specs also act as documentation
  • adding the rest of the specs should be fairly mechanical once we evaluate this first draft, as the main hurdle at this point was setting up instrumentation for type signatures
  • how would you feel about using defn-spec?

f-f avatar May 17 '21 18:05 f-f

would it be feasible to add an ADR for this? I would love to read about why and how we solve this problem and later probably again some time. Update: Probably not...was not reading properly. This is maybe not a change that needs an ADR:)

TimoKramer avatar May 18 '21 12:05 TimoKramer

Should this be a draft PR for now?

hellerve avatar May 18 '21 18:05 hellerve

@TimoKramer Is this the form of spec'ing you would also have expected for the client?

whilo avatar May 20 '21 22:05 whilo

I thought about that. Probably it is possible to generate openapi-spec from that. For datahike-server I want to have a openapi-spec for the rest-api to generate routes and validation automatically and auto-generate clients based on that as well.

TimoKramer avatar May 21 '21 07:05 TimoKramer

Should this be a draft PR for now?

@hellerve I forgot to make it a draft at first and now it looks like I cannot convert it to draft anymore, so I just prepended WIP to the title for now

Probably it is possible to generate openapi-spec from that. For datahike-server I want to have a openapi-spec for the rest-api to generate routes and validation automatically and auto-generate clients based on that as well.

@TimoKramer yep, spec-tools supports generating an OpenAPI document from specs, so once all the API surface is specced then generating clients from there is easy

f-f avatar May 21 '21 15:05 f-f

Is this the form of spec'ing you would also have expected for the client?

Right, I wanted to add a note about this: the specs I'm adding are Clojure specific, e.g. there's the fact that we return an atom from connect - if we are to use this layer as a "common API" for clients then this is not good: obviously the objects returned from a REST API should be "simple data" (since everything has to be encoded to JSON or similar format). I am not sure how a client should look like, so I don't have an answer for this yet

f-f avatar May 21 '21 15:05 f-f

@f-f I added a dedicated Connection type in my PR last night here https://github.com/replikativ/datahike/pull/332/files#diff-24742157a2cef9db99aa63817dda341a1d110787835b4a18ac27aa95c9bcc529R25. It basically gives us control over the dereferencing operation and can be updated by transact through swap!. There are still some other operations in the codebase operating on connections, such as resetting them, but can probably remove those. A third functionality that Clojure's Atoms provide is metadata that we use for listeners on changes (callbacks).

whilo avatar May 21 '21 19:05 whilo

@f-f I thought a little bit about the varargs and moving forward I'm not in favor for them and I'm refactoring the tests anyway, so you can go ahead and not spec them. Regarding defn-spec I think it looks good and declutters a file with many definitions. On the other hand, people not familiar with it would be confused looking at the functions. Do you know a project that uses it?

kordano avatar May 26 '21 15:05 kordano

@kordano great news! Do you think I should wait for the test refactoring before moving forward with this? It might be actually necessary if we don't want to spec these varargs now, since I'm instrumenting the specs at test time and tests will just fail if they don't conform to the "type signatures".

In the meantime I stumbled on the very dynamic q, that seems to be hard to spec cleanly since it accepts so many things. I'm thinking how to deal with that, but it might be fine to have a more "superficial" speccing for that right now, and then do a deeper refactoring later (that would also reuse these specs to conform the arguments, reduce duplication, and generally aid implementation)

Re defn-spec: I don't know of any projects using it, and while it looks like a neat idea, I'm leaning more towards just using the vanilla s/fdef, to save on the novelty budget.

f-f avatar May 26 '21 23:05 f-f

This is now ready for review! I feel like I'm fairly happy with where I got here, and tests are now passing locally. (I suspect CI should run, but I also suspect someone authorized to merge should press the button to run it since it's my first contribution here)

Final report:

  • the overview of the patch is that now all documented functions in the datahike.api namespace also ship with a Clojure Spec attached to them, defined through clojure.spec.alpha/fdef. These specs are then checked at test-running-time and in development, and not checked at all in production code
  • the specs are accurate only to the extent of where we exercise the function calls in tests - e.g. some functions might return nil, but since they never do in tests then this behaviour is not specced. So expect these specs to be imperfect and need tweaks, at least in the beginning (fortunately error messages are very nice and point exactly to where issues are)
  • however, the function arguments and return values are specced to a degree that we deemed satisfactory when discussing it with @whilo and @kordano, but there are some places where we could spec things more in detail - such as the database schema, for which we already have a spec. We don't do this here since this only intends to be a first pass. I left some TODOs in some places where I think there could be improvements in the future
  • contrary to a previous suggestion in the thread the deprecated varargs are specced, but there's some duplication between the specs we use in the fdefs and the existing ones for things like the database config. They should be unified, but since they serve different purposes - the existing one is to conform the arguments to the function call to a predictable data structure - I'd prefer to defer this work to a later moment where we depend a bit less on the varargs.
  • we add a new datahike.spec namespace here, which I suspect will be documented by cljdoc - do we want to tweak the docs in any way?

f-f avatar Jul 28 '21 12:07 f-f

This work will be updated and integrated with PR #596

jsmassa avatar Jan 11 '23 21:01 jsmassa