tablecloth icon indicating copy to clipboard operation
tablecloth copied to clipboard

`difference` should be just `anti-join`

Open jsa-aerial opened this issue 6 months ago • 0 comments

Generally, if you are going to 'difference' a couple of datasets it will be based on a specific shared column. Or a column from each over the same domain. So, difference really needs at least a column-selector in its signature. But then that makes it essentially just anti-join. And anti-join looks to be used only in difference. So, it seems that anti-join should just be renamed difference and the current difference removed or renamed to something like deprecated-difference or some such.

The options argument does not look to have any public description and is only relevant in the non-public semi-anti-join-indexes. So, backward compatibility can be maintained by:

(defn difference
  ([ds-left ds-right]
   (difference ds-left ds-right
               (distinct (clojure.core/concat
                          (ds/column-names ds-left)
                          (ds/column-names ds-right)))))
  ([ds-left ds-right columns-selector]
   (-> (->> (semi-anti-join-indexes ds-left ds-right columns-selector nil)
            (drop-rows ds-left))
       (vary-meta assoc :name "difference"))))

jsa-aerial avatar Oct 31 '25 19:10 jsa-aerial