medley
medley copied to clipboard
Suggested addition: generalization of group-by/index-by
Not infrequently I find myself wanting to to group a collection with a custom collation. Between medley and clojure.core we already have two functions with hard-coded collations: group-by uses conj on a vector, and index-by uses last-wins collation. Both functions could be implemented in terms of a more general one that also lets devs provide their own collations:
;; Placeholder name, this could even be additional arities of index-by
;; Naive implementation, not optimized
(defn group-with
"key fn, collating fn, initial value, coll"
([kf cf coll]
(reduce #(let [k (kf %2)]
(assoc %1 k (if-some [v (get %1 k)]
(cf v %2)
%2)))
{} coll))
([kf cf init coll]
(reduce #(update %1 (kf %2) (fnil cf init) %2) {} coll)))
Here are some usage examples that compare implementations with and without this function:
(def data
["apple" "banana" "clojure" "aardvark"])
;; Implementing group-by in terms of group-with
(group-by first data)
(group-with first conj [] data)
;; What if you want to use a different collection type?
(reduce #(update %1 (first %2) (fnil conj #{}) %2) {} data)
(-> (group-by first data) (update-vals set)) ;more clear, but inefficient
(group-with first conj #{} data)
;; First, defining a helper. This might also be a useful addition to medley, I'll file a separate issue if you're interested
(defn unapply
"The opposite of apply: (unapply f [a b] c d) -> (f a b [c d])"
[f & args]
(if (and (vector? (first args)) (seq (rest args)))
((apply partial f (first args)) (rest args))
(f args)))
;; Implementing index-by in terms of group-with
(into {} (map (juxt first identity)) data)
(-> (group-by first data) (update-vals last)) ;more clear, but inefficient
(reduce #(assoc %1 (first %2) %2) {} data)
(m/index-by first data) ;obviously index-by is the most concise
(group-with first (fn [_ b] b) data)
(group-with first (partial unapply second) data) ;unapply helps this read a little more clearly
;; What if you want first-wins collation instead of index-by's last-wins collation?
(reduce #(if (get %1 (first %2)) %1 (assoc %1 (first %2) %2)) {} data) ;trickier because you have to deal with the initial nil value case
(reduce #(update %1 (first %2) (fn [a b] (if (some? a) a b)) %2) {} data)
(into {} (reverse (map (juxt [first identity]) data)))
(-> (group-by first data) (update-vals first)) ;more clear, but inefficient
(group-with first (fn [a _] a) data) ;much better symmetry with the last-wins case
(group-with first (partial unapply first) data)
;; What if you want to pick the winner based on some other criteria? E.g. max-key
(reduce #(assoc %1 (first %2) (max-key count %2 (get %1 (first %2)))) {} data)
(reduce #(update %1 (first %2) (partial max-key count) %2) {} data)
(-> (group-by first data) (update-vals (partial apply max-key count))) ;more clear, but inefficient
(group-with first (partial max-key count) data)
;; Similar to with first-wins collation, min-key ends up being tricky because (= (count nil) 0), whereas group-with retains a nice symmetry with the max-key version