tablecloth
tablecloth copied to clipboard
`difference` should be just `anti-join`
Generally, if you are going to 'difference' a couple of datasets it will be based on a specific shared column. Or a column from each over the same domain. So, difference really needs at least a column-selector in its signature. But then that makes it essentially just anti-join. And anti-join looks to be used only in difference. So, it seems that anti-join should just be renamed difference and the current difference removed or renamed to something like deprecated-difference or some such.
The options argument does not look to have any public description and is only relevant in the non-public semi-anti-join-indexes. So, backward compatibility can be maintained by:
(defn difference
([ds-left ds-right]
(difference ds-left ds-right
(distinct (clojure.core/concat
(ds/column-names ds-left)
(ds/column-names ds-right)))))
([ds-left ds-right columns-selector]
(-> (->> (semi-anti-join-indexes ds-left ds-right columns-selector nil)
(drop-rows ds-left))
(vary-meta assoc :name "difference"))))