zingg
zingg copied to clipboard
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
documenter should display number of total records, those marked as matches and non matches.
need a way to figure out abbreviations and support matching them
alter the febrl schema and change ssn to date type. make the match type fuzzy and see how it works.
https://www.youtube.com/watch?v=AZ2mUSsgbM0 using pyscript, can we build an interactive labeller?
We do not have a way for users to understand the blocking tree. We should add that to the documenter.
Build a new febrl test file in XLS/XLSX format and test it
right now the user has to compile and specify blocking and similarity functions as per Zingg. it will be cool if they can just specify some functions which we can...
Many users have name, address in one field. It will be helpful to have a way to extract these fields - maybe through crfs? this will lead to a far...
we need to figure out number of runs, type of runs, location, env etc