zingg
zingg copied to clipboard
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
so that time taken is less and we do not do redundant processing.
add info on the model eg https://docs.deepchecks.com/en/stable/examples/guides/quickstart_in_5_minutes.html
Let us expose our deterministic matching parts so that more use cases can be solved with Zingg
is there a way for us to support/build technnologies like datavant tokenization which can then be used for matching tokens in a privacy preserving way
**Z Columns** Zingg uses the few internal columns to store internal and intermediate data **Describe the solution you'd like** As of now, things work fine as expected. However, it is...
**Describe the question** Some of my entities have po boxes, some have street addresses, some have both. Im trying to understand the inner-workings so I can use the tool smarter...
Hello! As you know I'm working with the NC 5M dataset. I am frequently restarting from scratch to test the framework I'm building. Each time, I'll run the findTrainingData +...
In the doc you recommend setting numPartitions to ~20-30x the number of worker cores. Is that a good rule of thumb for all job types? (e.g. findTrainingData, trainMatch, link, etc)
What configuration should we use for generateDocs? I tried the one I use for findTrainingData but ended up getting an error (stderr attached): `java.io.FileNotFoundException: /home/[email protected]/NCVoter360/zinggModels/April19_voters/model.html (No such file or directory)`...
Having a prebuilt simple model can be helpful to get people started.