zingg icon indicating copy to clipboard operation
zingg copied to clipboard

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Results 147 zingg issues
Sort by recently updated
recently updated
newest added

In highly regulated scenarios, businesses need deterministic matching as well.

Which just invokes the blockingTree for each record.

good first issue

Hi team! When deduplicating more than 1 data source, have you considered a config to tell zingg to ignore/skip pairs where both records originate from the same source? In my...

question

1. Once a Zingg model is trained, I think I can safely assume that the computational complexity (in big-O) is the same whether doing linking or further de-duplicating, yes? 1....

question

What if we could take sql from say a dbt model or otherwise and use that for our model training - blocking as well as similarity? Then non Java programmers...

enhancement

Right now there is a lot of repeat code in the labeller and update labeller classes - for ex execute method could be in one place. Code is pretty hard...

technicalDebt

It will improve readability quite a bit if - headers were bold - options yes/no etc were color coded

enhancement

We can build a cli that can filter and show results to the user, much like the labeller

enhancement

Reported by Luke from Databricks [zingg_Dec21_0823_log4j-active (1).txt](https://github.com/zinggAI/zingg/files/7761163/zingg_Dec21_0823_log4j-active.1.txt) [zingg_Dec21_0823_sdtderr.txt](https://github.com/zinggAI/zingg/files/7761167/zingg_Dec21_0823_sdtderr.txt)