htr-united icon indicating copy to clipboard operation
htr-united copied to clipboard

Add "file" counts for a few datasets

Open alix-tz opened this issue 2 years ago • 1 comments

@PonteIneptique do you have any objection to adding the following informations in the catalog:

Dataset name (xml) file count
Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X 201
Charters and Records of Königsfelden Abbey and Bailiwick (1308-1662) 283
The POPP datasets 235
Eutyches 129
FoNDUE-GasparoSardiToponomasia-Dataset 49
FoNDUE Spanish chapbooks 19th c. Dataset 198
Éditer la correspondance de Constance de Salm (1767-1845) 45
Jeu de données OCR - Incunables sévillans 1494-1500 62
Données vérité de terrain HTR+ Annuaire des propriétaires et des propriétés de Paris et du département de la Seine (1898-1923) 169

I went through each of these repositories to count the number of XML files corresponding to ground truth. Note that for "Handwritten Text Recognition Ground Truth Set: StABS Ratsbücher O10, Urfehdenbuch X", I only counted the PAGE files (all the ALTO files have a PAGE equivalent, which is not true the other way around). Same for "Données vérité de terrain HTR+ Annuaire des propriétaires et des propriétés de Paris et du département de la Seine (1898-1923)".

If we add these metrics, we would have the "file" metric available for every dataset currently listed in the catalog.

alix-tz avatar Oct 31 '22 22:10 alix-tz