Mariona issues

Results 6 issues of


Mariona

Allow disabling cosine similarity in candidate ranking

enhancement

Add OCR tutorial for DH2022

Prepare tutorial on using DeezyMatch for OCR: https://dh2022.adho.org/workshops-and-tutorials/wt-13 > We will show how a DeezyMatch model can be created from token-level alignments of OCRed text and their manual corrections. We...

tutorial

Add Heritage Gazetteer of Libya tutorial for DH2022

Prepare tutorial on using DeezyMatch with the Heritage Gazetteer of Libya: https://dh2022.adho.org/workshops-and-tutorials/wt-13 > We will show how to create a DeezyMatch models that are trained on Arabic name variations and...

tutorial

Add option to extend the vocabulary when fine-tuning a model

Context: how do we deal with missing vocabulary when fine-tuning a model? This is particularly an issue with ngram/word tokenization (the `characters_v001.vocab` solves it in part models using char tokenization).

enhancement

new feature

Create documentation on adapting DeezyMatch to a project

From the tutorial at Linked Pasts: https://github.com/LinkedPasts/LaNC-workshop/blob/main/deezymatch/recommendations.md

documentation

tutorial

Add option to print candidate ranking less verbose progress

The candidate ranker prints progress for each query like this: ![Screenshot 2020-12-17 at 10 55 59](https://user-images.githubusercontent.com/46483603/102479339-ae442a00-4056-11eb-99a7-62b44f1dd3fd.png) When we have lots of queries, jupyter notebook sometimes complains (IOPub data rate exceeded)....

enhancement