zingg
zingg copied to clipboard
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.14 s - in zingg.hash.TestGetAs 4 seconds is way too long for this test. investigate and suggest what needs...
It will be nice to have a native Snowflake version of Zingg which can run without Spark. While looking at snowpark, here are the first level thoughts - Need to...
**Is your feature request related to a problem? Please describe.** If you have an entity represented as first name & last name but are limited in other identifying fields, are...
py4j.protocol.Py4JJavaError: An error occurred while calling o148.execute. : zingg.client.ZinggClientException: zingg.client.FieldDefinition; local class incompatible: stream classdesc serialVersionUID = -2785755064886098506, local class serialVersionUID = 607273313522895964 at zingg.Matcher.execute(Matcher.java:159) at zingg.client.Client.execute(Client.java:215) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)...
config the fieldDefinition section for 2 sources with different schemas - see #204
**Is your feature request related to a problem? Please describe.** Zingg config supports more than what is explicitly documented. New users, particularly users new to Spark as well (like me),...
**Is your feature request related to a problem? Please describe.** When feeding Zingg ndjson that has depth, such as an array or an object, Zingg flattens the record, discarding, the...
**Describe the bug** When invalid data format is given, it results in an exception with no message. ``` 022-03-04 07:27:55,265 [main] WARN org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry - The function round replaced a previously...