htr-united
htr-united copied to clipboard
Add "file" counts for a few datasets
@PonteIneptique do you have any objection to adding the following informations in the catalog:
Dataset name | (xml) file count |
---|---|
Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X | 201 |
Charters and Records of Königsfelden Abbey and Bailiwick (1308-1662) | 283 |
The POPP datasets | 235 |
Eutyches | 129 |
FoNDUE-GasparoSardiToponomasia-Dataset | 49 |
FoNDUE Spanish chapbooks 19th c. Dataset | 198 |
Éditer la correspondance de Constance de Salm (1767-1845) | 45 |
Jeu de données OCR - Incunables sévillans 1494-1500 | 62 |
Données vérité de terrain HTR+ Annuaire des propriétaires et des propriétés de Paris et du département de la Seine (1898-1923) | 169 |
I went through each of these repositories to count the number of XML files corresponding to ground truth. Note that for "Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X", I only counted the PAGE files (all the ALTO files have a PAGE equivalent, which is not true the other way around). Same for "Données vérité de terrain HTR+ Annuaire des propriétaires et des propriétés de Paris et du département de la Seine (1898-1923)".
If we add these metrics, we would have the "file" metric available for every dataset currently listed in the catalog.