Daniel van Strien
Daniel van Strien
Awesome, the only other I would check is that when you download the images we can get sufficient metadata for each image to verify the licence/copyright. What information is downloaded...
Great that looks good. I think if we can include the source information/URL that would be great. My own preference would also to be include as much information as possible...
> Last night, I scraped the pages from the website, by following the restrictions agreed upon. This is the resulting dataset, stored on the hub: [huggingface.co/datasets/gigant/oldbookillustrations_2](https://huggingface.co/datasets/gigant/oldbookillustrations_2) > > Do you...
Thanks so much for this. Having given this a bit more thought, I think it probably makes sense to try and filter out the items which may have copyright issues....
> Thanks so much for this. Having given this a bit more thought, I think it probably makes sense to try and filter out the items which may have copyright...
This sounds great, thanks for suggesting it! If you also want to work on adding this feel free to use the `#self-assign` command to assign yourself to work on this.
@ericleasemorgan I thought we could use this issue to discuss further best approach for this dataset :)
I'll try and take a closer look at this again next week but some initial thoughts below: >Just to re-iterate, the next step is for me to write a little...
> Moved the dataset to the biglam organisation [biglam/bnl_ground_truth_newspapers_before_1878](https://huggingface.co/biglam/bnl_ground_truth_newspapers_before_1878) I think this got created as a model, so I've just moved it to a dataset. I think it could also...
Looks good. We could maybe also think about adding some more general guidance on working with IIIF images/manifests. I have some code for parsing manifest which I can try and...