zingg issues

have option to switch off stop word generation

so that time taken is less and we do not do redundant processing.

sonalgoyal

model features

add info on the model eg https://docs.deepchecks.com/en/stable/examples/guides/quickstart_in_5_minutes.html

sonalgoyal

Expose determinstic matching

Let us expose our deterministic matching parts so that more use cases can be solved with Zingg

sonalgoyal

privacy preserving entity resolution

is there a way for us to support/build technnologies like datavant tokenization which can then be used for matching tokens in a privacy preserving way

sonalgoyal

question

Rationalize the Columns used within Zingg

**Z Columns** Zingg uses the few internal columns to store internal and intermediate data **Describe the solution you'd like** As of now, things work fine as expected. However, it is...

navinrathore

matching against empty fields and our of order fields

1

**Describe the question** Some of my entities have po boxes, some have street addresses, some have both. Im trying to understand the inner-workings so I can use the tool smarter...

tomdavidson

question

Suggestions for FindTrainingData step

9

Hello! As you know I'm working with the NC 5M dataset. I am frequently restarting from scratch to test the framework I'm building. Each time, I'll run the findTrainingData +...

lsbilbro

question

numPartitions guidance?

1

In the doc you recommend setting numPartitions to ~20-30x the number of worker cores. Is that a good rule of thumb for all job types? (e.g. findTrainingData, trainMatch, link, etc)

lsbilbro

question

configuration should be used for generateDocs phase?

1

What configuration should we use for generateDocs? I tried the one I use for findTrainingData but ended up getting an error (stderr attached): `java.io.FileNotFoundException: /home/[email protected]/NCVoter360/zinggModels/April19_voters/model.html (No such file or directory)`...

lsbilbro

question

Unsupervised model

2

Having a prebuilt simple model can be helpful to get people started.

sonalgoyal

enhancement

zingg
zingg copied to clipboard

Metadata

have option to switch off stop word generation

model features

Expose determinstic matching

privacy preserving entity resolution

Rationalize the Columns used within Zingg

matching against empty fields and our of order fields

Suggestions for FindTrainingData step

numPartitions guidance?

configuration should be used for generateDocs phase?

Unsupervised model

← Metadata

Owner

Metadata

zingg zingg copied to clipboard

Metadata

← Metadata

Owner

Metadata

zingg
zingg copied to clipboard