zingg
zingg copied to clipboard
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
We have to figure out an error code/error and information reporting framework so that jobs orchestrated through dags like Airflow etc can be handled gracefully. Check best practises here.
Explore if we can provide an easy way to build the json args - can we use something like https://github.com/json-editor/json-editor ?
descriptive text and word vector/n grams kind of data may need different kinds of blocking. evaluate this
putting this out there as a thought,
Now that the documents are in a better consumable shape, we should add the case studies
In some cases an SVM may be better, so give a way to plug that with default being logReg
This is an umbrella feature request to get Zingg smarter in terms of understanding column types - how cool will it be to understand that a particular column denotes email...
explore terraform
*One dimension to similarity is column similarity - predicting which columns are same and can be joined/matched is a problem in itself.
We should be able to resolve unstructured entities to structured records in a database. That will enable a host of applications for ekg, fraud etc