zingg icon indicating copy to clipboard operation
zingg copied to clipboard

data preparation - tagging

Open sonalgoyal opened this issue 2 years ago • 2 comments

Matching can be greatly improved if we can segment and tag individual attributes, for example in address. check if we can use libpostal or build somethign similar which is generic. We can also build for a case where the table already has data and we use that for training.

sonalgoyal avatar Mar 20 '22 12:03 sonalgoyal

Im using libpostal for parsing addresses and it works rather well. The normalization via the expand method doesn't quite make sense to me so not in use right now, but I suppose if automatic complete standardization was possible then ER would not be needed.

Addresses seems like a common use case, but what is Zingg's scope for data prep?

tomdavidson avatar Mar 22 '22 06:03 tomdavidson

Currently we do not have any data preparation facilities. but we want to

a. support field extraction - say one field has name, address etc combined. matching will improve if we can break it up into components. b. normalization. this can be done through learning from the matching records. Say records 118 E Avenue and 118 E Av. That means Avenue is same as Av. We can learn these patterns and suggest to user

sonalgoyal avatar Mar 22 '22 07:03 sonalgoyal