medley icon indicating copy to clipboard operation
medley copied to clipboard

Suggested addition: generalization of group-by/index-by

Open maxrothman opened this issue 1 year ago • 0 comments

Not infrequently I find myself wanting to to group a collection with a custom collation. Between medley and clojure.core we already have two functions with hard-coded collations: group-by uses conj on a vector, and index-by uses last-wins collation. Both functions could be implemented in terms of a more general one that also lets devs provide their own collations:

;; Placeholder name, this could even be additional arities of index-by
;; Naive implementation, not optimized
(defn group-with
  "key fn, collating fn, initial value, coll"
  ([kf cf coll]
   (reduce #(let [k (kf %2)]
              (assoc %1 k (if-some [v (get %1 k)] 
                            (cf v %2)
                            %2)))
           {} coll))
  ([kf cf init coll]
   (reduce #(update %1 (kf %2) (fnil cf init) %2) {} coll)))

Here are some usage examples that compare implementations with and without this function:

(def data
  ["apple" "banana" "clojure" "aardvark"])

;; Implementing group-by in terms of group-with
(group-by first data)
(group-with first conj [] data)

;; What if you want to use a different collection type?
(reduce #(update %1 (first %2) (fnil conj #{}) %2) {} data)
(-> (group-by first data) (update-vals set))  ;more clear, but inefficient

(group-with first conj #{} data)


;; First, defining a helper. This might also be a useful addition to medley, I'll file a separate issue if you're interested
(defn unapply
 "The opposite of apply: (unapply f [a b] c d) -> (f a b [c d])"
 [f & args]
  (if (and (vector? (first args)) (seq (rest args)))
    ((apply partial f (first args)) (rest args))
    (f args)))


;; Implementing index-by in terms of group-with
(into {} (map (juxt first identity)) data)
(-> (group-by first data) (update-vals last))  ;more clear, but inefficient
(reduce #(assoc %1 (first %2) %2) {} data)
(m/index-by first data)  ;obviously index-by is the most concise

(group-with first (fn [_ b] b) data)
(group-with first (partial unapply second) data)  ;unapply helps this read a little more clearly


;; What if you want first-wins collation instead of index-by's last-wins collation?
(reduce #(if (get %1 (first %2)) %1 (assoc %1 (first %2) %2)) {} data)  ;trickier because you have to deal with the initial nil value case
(reduce #(update %1 (first %2) (fn [a b] (if (some? a) a b)) %2) {} data)
(into {} (reverse (map (juxt [first identity]) data)))
(-> (group-by first data) (update-vals first))  ;more clear, but inefficient

(group-with first (fn [a _] a) data)  ;much better symmetry with the last-wins case
(group-with first (partial unapply first) data)


;; What if you want to pick the winner based on some other criteria? E.g. max-key
(reduce #(assoc %1 (first %2) (max-key count %2 (get %1 (first %2)))) {} data)
(reduce #(update %1 (first %2) (partial max-key count) %2) {} data)
(-> (group-by first data) (update-vals (partial apply max-key count)))  ;more clear, but inefficient

(group-with first (partial max-key count) data)

;; Similar to with first-wins collation, min-key ends up being tricky because (= (count nil) 0), whereas group-with retains a nice symmetry with the max-key version

maxrothman avatar Feb 22 '24 18:02 maxrothman