dedupe icon indicating copy to clipboard operation
dedupe copied to clipboard

When creating a canonical list, can non-string values be combined in various ways?

Open webmaven opened this issue 11 years ago • 2 comments

Examples:

  1. If a field is a list of strings (eg. tags), the canonical field value could be either a de-duped union of the lists, or the intersection.
  2. Numeric values could produce the mean (or median).
  3. Numeric values could produce a range (highest and lowest values seen)
  4. Numeric ranges (eg. 60-70) could produce a range where the high and low values are medians or means
  5. Geo coordinates could produce the centroid.

Etc.

For my use case I am most interested in the first four.

webmaven avatar Jul 17 '14 17:07 webmaven

Our current canonicalize method does not implement those things, but could.

fgregg avatar Jul 18 '14 00:07 fgregg

I've brought canonicalize back into core dedupe so repopening.

fgregg avatar Jul 23 '17 18:07 fgregg