zingg
zingg copied to clipboard
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
protected Dataset getBlocks(Dataset blocked) throws Exception{ return DSUtil.joinWithItself(blocked, ColName.HASH_COL, true).cache(); } this method is not called form the code and will lead to issues with the linker and resolver?
update Dockerfile and Readme.md to 0.4.1-SNAPSHOT from 0.4.0 in enterprise branch
The issue is due to case sensitive comparison of column name, input vs what's in config e.g. following works fine: ./scripts/zingg.sh --phase recommend --conf examples/febrl/config.json --column XYZ but not ./scripts/zingg.sh...
this makes readthedoc build fail ` sonal@sonal-mac docs % rm -rf _build/html; make html Running Sphinx v7.2.6 path is /Users/sonal/zingg/python/zingg making output directory... done loading pickled environment... done building [mo]:...
this PR is for issue [651](https://github.com/zinggAI/zingg/issues/651) i have used this meta.yaml for recipe, command used: conda update conda conda install conda-build conda info --envs conda activate base conda build condaRecipe
(.venv) vikasgupta@Vikass-MacBook-Air /tmp % databricks-connect test * PySpark is installed at /opt/homebrew/lib/python3.10/site-packages/pyspark * Checking SPARK_HOME * Checking java version java version "1.8.0_351" Java(TM) SE Runtime Environment (build 1.8.0_351-b10) Java HotSpot(TM)...
some of the methods are tied to a phase, some are probably invoked directly while jackson sets the property? Need to see whats really needed here and if the code...
Reopening this topic in a new issue, I'm not able to get this working. ``` from zingg.client import * options = ClientOptions([ClientOptions.COLLECT_METRICS, "false"]) > AttributeError: type object 'ClientOptions' has no...
2023-11-24 17:49:58,143 [main] WARN org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry - The function affinegapsimilarityfunction replaced a previously registered function. 2023-11-24 17:49:58,143 [main] WARN org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry - The function jarowinklerfunction replaced a previously registered function. 2023-11-24 17:49:58,143...