htr-united icon indicating copy to clipboard operation
htr-united copied to clipboard

Simultaneously use ALTO and PAGE XML in a dataset?

Open alix-tz opened this issue 3 years ago • 2 comments

I might consider doing this with LECTAUREP, but I wonder what would be the best approach and how this would impact documenting the volumes and the dataset.

For example, I could do 2 different folders (/data/alto and /data/page) but then how would I declare the format in htr-united.yml, and will it be possible to refine the volumes of files for each XM format (like files.alto = 100 and files.page = 100 in stead of files = 200)?

Other options could include:

  • creating two different repositories (ex: lectaurep-bronod-alto and lectaurep-bronod-page) but then, on top of doubling the actions to maintain the dataset, wouldn't it artificially expand the number of datasets in HTR-United's catalog?
  • documenting only one of the two formats (but then it means that if a user is looking for dataset in the format I didn't document, they would miss my dataset)

I can't find any of these options really satisfaying. @PonteIneptique, do you have any opinion?

alix-tz avatar Feb 24 '22 13:02 alix-tz