datahike
datahike copied to clipboard
Datom and index metrics
SUMMARY
Add datom and index metrics
Checks
Feature
- [ ] Implements an existing feature request. Make sure the feature request has been accepted for implementation before opening a PR.
- [ ] Related issues linked using
fixes #number
- [ ] Integration tests added
- [ ] Documentation added
- [ ] Architecture Decision Record added
- [ ] Formatting checked
- [ ]
CHANGELOG.md
updated
Does it make sense to provide count and index metrics on database records besides DB (HistoricalDB, FilteredDB at the moment)? Thought it could be useful, hence -metrics is in the IDB protocol for now so it can be implemented across database records.
Lacks readability. I would really encourage you to use more threading macros and not to hesitate to put expressions on a new line when things get too nested. Also, even if it is just to keep some order, within a hash-map, please, stick to either putting the value next to its key or below it, but don't mix it up!
Then again, why do all that instead of simply counting the values of (group-by :a)
and (group-by :e)
on (-seq (.-eavt db))
(invoked here anyway) and its temporal equivalent? This buys neither memory nor performance, I think. Am I missing or forgetting anything?
What do you mean? Do you want to somehow include it in the -seq
function or count everything stepping through the sequence just once?
I mean something like this (abridged for clarity):
(let [eavt-datoms (-seq (.-eavt db))]
{:per-attr-counts (->> (vals (group-by :a eavt-datoms))
(map count))
:per-entity-counts (->> (vals (group-by :e eavt-datoms))
(map count))})
Sure, you can write it like that. I don't think it has performance benefits either though. I would expect that it will still step through the sequence twice; once for each of the grouping operations. To only step through it once, you could implement a big reduce operation with the empty count-map as initial value and then set/increase the counts with assoc-in
or update-in
operations.
It definitely makes sense to implement one function and then call it for the current and for the temporal indices.
I also think, this implementation will not result in what you actually wanted to get. By calling vals
, you lose the information of which attribute the count corresponds to. I think, what you would want is something like:
(->> eavt-datoms
(group-by :e)
(map-vals count))
(The map-vals
function is not part of the clojure core library, but can be pulled in with the medley library or has to be written yourself.)
Btw, note how I am using even more lines of code here for the threading macro. This is the easiest to read: start with the object you want to transform on the first line and then list each transformation from top to bottom each on their own line even if it's something short like vals
.