DeezyMatch icon indicating copy to clipboard operation
DeezyMatch copied to clipboard

A Flexible Deep Learning Approach to Fuzzy String Matching

Results 30 DeezyMatch issues
Sort by recently updated
recently updated
newest added

- Performance of the ranker as a function of #queries, for different #candidates - This needs to be done for different ranking methods

Refer to: https://github.com/Living-with-machines/DeezyMatch_tutorials/issues/4#issue-1317011658

tutorial

From @mcollardanuy: - converting the query into a vector, - converting a candidate into a vector, and - deciding whether they are variations of the same word or not

enhancement
new feature

Prepare tutorial on using DeezyMatch for OCR: https://dh2022.adho.org/workshops-and-tutorials/wt-13 > We will show how a DeezyMatch model can be created from token-level alignments of OCRed text and their manual corrections. We...

tutorial

Prepare tutorial on using DeezyMatch with the Heritage Gazetteer of Libya: https://dh2022.adho.org/workshops-and-tutorials/wt-13 > We will show how to create a DeezyMatch models that are trained on Arabic name variations and...

tutorial

Context: how do we deal with missing vocabulary when fine-tuning a model? This is particularly an issue with ngram/word tokenization (the `characters_v001.vocab` solves it in part models using char tokenization).

enhancement
new feature

- [ ] Document some of the methods that we have tried in our experiments so far to generate negative examples. - [ ] Add experiments/case-studies so far Related docs:...

documentation
tutorial

Extend list of accepted values for positive matches. Change `data_processing.py` (see in particular [lines 37-43](https://github.com/Living-with-machines/DeezyMatch/blob/06f80171cf36543ce6941c960c2d89366fb03fab/DeezyMatch/data_processing.py#L37), but you may have to do other changes in subsequent lines) so it also accepts,...

enhancement
help wanted

@fedenanni suggested: link: https://arxiv.org/pdf/2103.06874.pdf