Daniel van Strien
Daniel van Strien
@cakiki thanks for suggesting! I think for TEI it depends a little bit on how many of the fields we want to extract. If we want to keep more true...
@adikeinan, thanks for suggesting this! I just realised the dataset suggestion template was broken (it should be fixed now). Is it this dataset: https://bl.iro.bl.uk/collections/ffab19f8-e9b7-485c-a7c3-64aba143be9c?locale=en. I'll open a new issue for...
Sorry for coming to this discussion bit late. My preference would be to include the tags/descriptions generated by cataloguers/Flickr users rather than add labels output from a generic ML model....
> A quick question on the metadata: Is it possible to upload multiple CSVs to an image dataset? I'm specifically thinking of the tag data, which I would have to...
I suggest leaving this as a candidate dataset until we have worked out the best approach. Tagging others who have been discussing this: @bmschmidt @stefan-it
## Data access: Currently, we have a few options for accessing the data: - use the data from https://pro.europeana.eu/page/iiif#download - use the API for access (do this once and save...
> Could you explain the notion of a "loading script"? I don't think I understand how the huggingface model--which seems to basically organized hierarchically--works with something like this. This depends...
> FWIW, my solution for this was to break up newspapers into multiple files only when they got above a certain size. There are a lot of weekly or monthly...
Thanks so much for that @stefan-it. @bmschmidt @stefan-it, my suggested next step is to start with the smallest dataset from that dump to get to a format we're happy with....
> Hi, just to briefly chime in (I hope I can devote more time to this tomorrow) - I have a lot of background info, provenance and documentation about these...