idx
idx copied to clipboard
Efficient count of lookup result?
I work on a stream implementation of datalog. I use idx to keep sets of datoms inmemory and query sub-sets of sets efficiently. These sets are joint with each other to comput a result set for a query. The simplest algo for join is a carthesian product of all sets and then filter for valid joins of datoms. This implementation is very inefficient, as the carthesian product grows exponentially.
The worst case optimal join solves this issue for many real world cases. The algo joins the smallest sets first. Right now I use (count set) to approximate the set size (because count on set is constant time). What would be much better is when (count (idx/lookup set)) would be constant thus (= (counted? (idx/lookup set 1 :attribute)) true), then I could use the actual sub-set to compute next.
What do you think?
ValSeq is unfortunately not counted, we would be looking at a change to the implementation, keeping a seq return, and performance in the same ballpark. I'll think about this, I also would be happy to accept any work in this area as a patch or merge request.