dedupe
dedupe copied to clipboard
When creating a canonical list, can non-string values be combined in various ways?
Examples:
- If a field is a list of strings (eg. tags), the canonical field value could be either a de-duped union of the lists, or the intersection.
- Numeric values could produce the mean (or median).
- Numeric values could produce a range (highest and lowest values seen)
- Numeric ranges (eg. 60-70) could produce a range where the high and low values are medians or means
- Geo coordinates could produce the centroid.
Etc.
For my use case I am most interested in the first four.
Our current canonicalize method does not implement those things, but could.
I've brought canonicalize back into core dedupe so repopening.