zingg icon indicating copy to clipboard operation
zingg copied to clipboard

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Results 147 zingg issues
Sort by recently updated
recently updated
newest added

right now the code has ColName.COL_PREFIX all over. We should see whats needed and then improve the code

Current Matcher has the Graph scoring and other graph stuff which makes them tighly coupled. We should move the scoring to a different class. Also think through other graph stuff...

Move stop words to pre processor

enhancement

blocking algorithms are currently heavily dependent on field order, giving vastly different results when field order in fedDefinitions is changed. We should make them more consistent.

We add the dataframe to the pipe when we read it, which modifies the original args object. In a way that is ok as we are only enriching the args....

(C) 2021 Zingg.AI -> change year have one header for analytics and zingg.

Current ZFrame has methods like drop(String, String..) which can be replaced with drop(String..)

Currently we have implemented methods in ZFrame that should actually be in Row, Column, StructType etc classes. eg getAsString. One thing to remember - StructField not serializable in Snowpark so...

technicalDebt

Preprocessing phase needed which will conver all data to lower case before start of any phase. This is specially relevant for stop words and recommender as currently those are case...