clj-fast
clj-fast copied to clipboard
May not be in scope, but fast detection of set intersection
I ran into a use case for a work product that needed to quickly test membership between 2 sets to see if "any" member existed in both, and return the first intersecting member as fast as possible.
A naive way one might reach for is
(defn slow-some-member [s1 s2]
(first (clojure.set/intersection s1 s2)))
Original implementation was
(defn some-member [s1 s2]
(let [[l r] (if (< (count s1) (count s2)) [s1 s2]
[s2 s1])]
(reduce (fn [acc x]
(if (r x)
(reduced x)
acc)) nil l)))
Which of course has some inefficiencies due to destructuring and slow function calls through clojure.lang.RT.
Revised version assumes sets as input,
(defn some-member2 [^clojure.lang.APersistentSet s1
^clojure.lang.APersistentSet s2]
(let [l (if (< (.count s1) (.count s2)) s1
s2)
r (if (identical? l s1) s2 s1)]
(reduce (fn [acc x]
(if (r x)
(reduced x)
acc)) nil l)))
user> (let [s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (slow-some-member s1 s2)))
Evaluation count : 975378 in 6 samples of 162563 calls.
Execution time mean : 612.287581 ns
Execution time std-deviation : 3.310927 ns
Execution time lower quantile : 608.261646 ns ( 2.5%)
Execution time upper quantile : 615.629475 ns (97.5%)
Overhead used : 2.233057 ns
nil
user> (let [s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (some-member s1 s2)))
Evaluation count : 2004582 in 6 samples of 334097 calls.
Execution time mean : 300.056678 ns
Execution time std-deviation : 2.946006 ns
Execution time lower quantile : 297.321682 ns ( 2.5%)
Execution time upper quantile : 303.695844 ns (97.5%)
Overhead used : 2.233057 ns
nil
user> (let [s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (some-member2 s1 s2)))
Evaluation count : 3298134 in 6 samples of 549689 calls.
Execution time mean : 181.087016 ns
Execution time std-deviation : 0.816367 ns
Execution time lower quantile : 180.408265 ns ( 2.5%)
Execution time upper quantile : 182.115366 ns (97.5%)
Overhead used : 2.233057 ns
nil
Gains are variable, and I haven't studied them for a wide range of set combinations. I'm interested in any faster means of doing this, although I just realized that my specific use case is amenable to memoization, which improves runtime (via memo-2) like 10x over my original "optimized" some-member
.
user> (let [f (memo-2 some-member2) s1 #{:a :b } s2 #{:g :a}] (c/quick-bench (f s1 s2)))
Evaluation count : 19762416 in 6 samples of 3293736 calls.
Execution time mean : 28.191368 ns
Execution time std-deviation : 0.181916 ns
Execution time lower quantile : 28.035774 ns ( 2.5%)
Execution time upper quantile : 28.427699 ns (97.5%)
Overhead used : 2.233057 ns