zingg
zingg copied to clipboard
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
In highly regulated scenarios, businesses need deterministic matching as well.
Hi team! When deduplicating more than 1 data source, have you considered a config to tell zingg to ignore/skip pairs where both records originate from the same source? In my...
1. Once a Zingg model is trained, I think I can safely assume that the computational complexity (in big-O) is the same whether doing linking or further de-duplicating, yes? 1....
What if we could take sql from say a dbt model or otherwise and use that for our model training - blocking as well as similarity? Then non Java programmers...
Right now there is a lot of repeat code in the labeller and update labeller classes - for ex execute method could be in one place. Code is pretty hard...
It will improve readability quite a bit if - headers were bold - options yes/no etc were color coded
We can build a cli that can filter and show results to the user, much like the labeller
Reported by Luke from Databricks [zingg_Dec21_0823_log4j-active (1).txt](https://github.com/zinggAI/zingg/files/7761163/zingg_Dec21_0823_log4j-active.1.txt) [zingg_Dec21_0823_sdtderr.txt](https://github.com/zinggAI/zingg/files/7761167/zingg_Dec21_0823_sdtderr.txt)