dedupe icon indicating copy to clipboard operation
dedupe copied to clipboard

consider a more sklearn like, pipeline approach

Open fgregg opened this issue 1 year ago • 1 comments

  1. break out all the active learning bits into a separate class or multiple separate classes

  2. train a blocking model, using the familiar fit_transform syntax. this is a separate class that emits a stream of pairs. (is this something that could really fit into the sklearn pattern)

  3. train a classification model using fit_transform., this takes in a stream of pairs and emits a stream of classification decisions

actually, this all would work quite well.

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html

fgregg avatar Sep 22 '24 21:09 fgregg

we can think of blocking as related to clustering, and use that as inspo.

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans

fgregg avatar Sep 22 '24 22:09 fgregg