Daniel van Strien comments

Results 138 comments of


                                            Daniel van Strien

Add dataset: archives_parlementaires_revolution_francaise

@cakiki thanks for suggesting! I think for TEI it depends a little bit on how many of the fields we want to extract. If we want to keep more true...

Add dataset: [arabic_htr_groundtruth]

@adikeinan, thanks for suggesting this! I just realised the dataset suggestion template was broken (it should be fixed now). Is it this dataset: https://bl.iro.bl.uk/collections/ffab19f8-e9b7-485c-a7c3-64aba143be9c?locale=en. I'll open a new issue for...

Add dataset: biodiversity_heritage_library

Sorry for coming to this discussion bit late. My preference would be to include the tags/descriptions generated by cataloguers/Flickr users rather than add labels output from a generic ML model....

Add dataset: biodiversity_heritage_library

> A quick question on the metadata: Is it possible to upload multiple CSVs to an image dataset? I'm specifically thinking of the tag data, which I would have to...

Add dataset: europeana_newspapers

I suggest leaving this as a candidate dataset until we have worked out the best approach. Tagging others who have been discussing this: @bmschmidt @stefan-it

Add dataset: europeana_newspapers

## Data access: Currently, we have a few options for accessing the data: - use the data from https://pro.europeana.eu/page/iiif#download - use the API for access (do this once and save...

Add dataset: europeana_newspapers

> Could you explain the notion of a "loading script"? I don't think I understand how the huggingface model--which seems to basically organized hierarchically--works with something like this. This depends...

Add dataset: europeana_newspapers

> FWIW, my solution for this was to break up newspapers into multiple files only when they got above a certain size. There are a lot of weekly or monthly...

Add dataset: europeana_newspapers

Thanks so much for that @stefan-it. @bmschmidt @stefan-it, my suggested next step is to start with the smallest dataset from that dump to get to a format we're happy with....

Add dataset: europeana_newspapers

> Hi, just to briefly chime in (I hope I can devote more time to this tomorrow) - I have a lot of background info, provenance and documentation about these...