Daniel van Strien
Daniel van Strien
cc @EziOzoani @meg-huggingface
Will write something on this next week :)
> Indeed. Would you like to open a PR for this? will do :)
@cakiki give me a shout if you want any help with this? I am quite familiar with this dataset :)
> I think the the loading script should parse the XML files. > > CC: @davanstrien I have a WIP script I have been working on for this. If it's...
Great - I will try and get the script finished today for use in BigScience. I might then hold off with a public script until we have the plain text...
@albertvillanova, sorry this took a bit longer. I did write a loading script, but because the XML processing is relatively slow for this data, the loading script was very slow,...
This is now available as a dataset here: https://huggingface.co/datasets/BritishLibraryLabs/EThOS-PhD-metadata.
Whilst this dataset should be fairly easy to add to the datasets hub, it is quite large, so you should be aware of this.
#self-assign