datahike Datom and index metrics

SUMMARY

Add datom and index metrics

Checks

Feature

[ ] Implements an existing feature request. Make sure the feature request has been accepted for implementation before opening a PR.
[ ] Related issues linked using fixes #number
[ ] Integration tests added
[ ] Documentation added
[ ] Architecture Decision Record added
[ ] Formatting checked
[ ] CHANGELOG.md updated

Apr 21 '22 05:04 yflim

Does it make sense to provide count and index metrics on database records besides DB (HistoricalDB, FilteredDB at the moment)? Thought it could be useful, hence -metrics is in the IDB protocol for now so it can be implemented across database records.

Apr 21 '22 16:04 yflim

Lacks readability. I would really encourage you to use more threading macros and not to hesitate to put expressions on a new line when things get too nested. Also, even if it is just to keep some order, within a hash-map, please, stick to either putting the value next to its key or below it, but don't mix it up!

Then again, why do all that instead of simply counting the values of (group-by :a) and (group-by :e) on (-seq (.-eavt db)) (invoked here anyway) and its temporal equivalent? This buys neither memory nor performance, I think. Am I missing or forgetting anything?

May 07 '22 21:05 yflim

What do you mean? Do you want to somehow include it in the -seq function or count everything stepping through the sequence just once?

May 09 '22 14:05 jsmassa

I mean something like this (abridged for clarity):

(let [eavt-datoms (-seq (.-eavt db))]
  {:per-attr-counts (->> (vals (group-by :a eavt-datoms))
                         (map count))
   :per-entity-counts (->> (vals (group-by :e eavt-datoms))
                           (map count))})

May 09 '22 17:05 yflim

Sure, you can write it like that. I don't think it has performance benefits either though. I would expect that it will still step through the sequence twice; once for each of the grouping operations. To only step through it once, you could implement a big reduce operation with the empty count-map as initial value and then set/increase the counts with assoc-in or update-in operations.

It definitely makes sense to implement one function and then call it for the current and for the temporal indices.

I also think, this implementation will not result in what you actually wanted to get. By calling vals, you lose the information of which attribute the count corresponds to. I think, what you would want is something like:

(->> eavt-datoms
     (group-by :e)
     (map-vals count))

(The map-vals function is not part of the clojure core library, but can be pulled in with the medley library or has to be written yourself.)

Btw, note how I am using even more lines of code here for the threading macro. This is the easiest to read: start with the object you want to transform on the first line and then list each transformation from top to bottom each on their own line even if it's something short like vals.

May 10 '22 23:05 jsmassa

datahike datahike copied to clipboard

Datom and index metrics

SUMMARY

Checks

Feature

datahike
datahike copied to clipboard