Daniel van Strien

Results 138 comments of Daniel van Strien

Will write something on this next week :)

> Indeed. Would you like to open a PR for this? will do :)

@cakiki give me a shout if you want any help with this? I am quite familiar with this dataset :)

> I think the the loading script should parse the XML files. > > CC: @davanstrien I have a WIP script I have been working on for this. If it's...

Great - I will try and get the script finished today for use in BigScience. I might then hold off with a public script until we have the plain text...

@albertvillanova, sorry this took a bit longer. I did write a loading script, but because the XML processing is relatively slow for this data, the loading script was very slow,...

This is now available as a dataset here: https://huggingface.co/datasets/BritishLibraryLabs/EThOS-PhD-metadata.

Whilst this dataset should be fairly easy to add to the datasets hub, it is quite large, so you should be aware of this.