DeezyMatch issues

Results 30 DeezyMatch issues

Sort by recently updated

Scaling tests

- Performance of the ranker as a function of #queries, for different #candidates - This needs to be done for different ranking methods

kasra-hosseini

[Tutorials] Issue with pytorch GPU

Refer to: https://github.com/Living-with-machines/DeezyMatch_tutorials/issues/4#issue-1317011658

kasra-hosseini

tutorial

Query/Candidate matching on-the-fly

From @mcollardanuy: - converting the query into a vector, - converting a candidate into a vector, and - deciding whether they are variations of the same word or not

kasra-hosseini

enhancement

new feature

Allow disabling cosine similarity in candidate ranking

mcollardanuy

enhancement

Prepare tutorial on using DeezyMatch for OCR: https://dh2022.adho.org/workshops-and-tutorials/wt-13 > We will show how a DeezyMatch model can be created from token-level alignments of OCRed text and their manual corrections. We...

mcollardanuy

tutorial

Add Heritage Gazetteer of Libya tutorial for DH2022

Prepare tutorial on using DeezyMatch with the Heritage Gazetteer of Libya: https://dh2022.adho.org/workshops-and-tutorials/wt-13 > We will show how to create a DeezyMatch models that are trained on Arabic name variations and...

mcollardanuy

tutorial

Add option to extend the vocabulary when fine-tuning a model

Context: how do we deal with missing vocabulary when fine-tuning a model? This is particularly an issue with ngram/word tokenization (the `characters_v001.vocab` solves it in part models using char tokenization).

mcollardanuy

enhancement

new feature

Improve documentation on generating train/valid/test datasets

- [ ] Document some of the methods that we have tried in our experiments so far to generate negative examples. - [ ] Add experiments/case-studies so far Related docs:...

kasra-hosseini

documentation

tutorial

Column 3 accepts (case-insensitive): [true, false, 0, 1], extend this to other cases: "Correct" "Wrong"

Extend list of accepted values for positive matches. Change `data_processing.py` (see in particular [lines 37-43](https://github.com/Living-with-machines/DeezyMatch/blob/06f80171cf36543ce6941c960c2d89366fb03fab/DeezyMatch/data_processing.py#L37), but you may have to do other changes in subsequent lines) so it also accepts,...

kasra-hosseini

enhancement

help wanted

Paper: Efficient Tokenization-Free Encoder

@fedenanni suggested: link: https://arxiv.org/pdf/2103.06874.pdf

kasra-hosseini

DeezyMatch
DeezyMatch copied to clipboard

Metadata

Scaling tests

[Tutorials] Issue with pytorch GPU

Query/Candidate matching on-the-fly

Allow disabling cosine similarity in candidate ranking

Add OCR tutorial for DH2022

Add Heritage Gazetteer of Libya tutorial for DH2022

Add option to extend the vocabulary when fine-tuning a model

Improve documentation on generating train/valid/test datasets

Column 3 accepts (case-insensitive): [true, false, 0, 1], extend this to other cases: "Correct" "Wrong"

Paper: Efficient Tokenization-Free Encoder

← Metadata

Owner

Metadata

DeezyMatch DeezyMatch copied to clipboard

Metadata

← Metadata

Owner

Metadata

DeezyMatch
DeezyMatch copied to clipboard