lam icon indicating copy to clipboard operation
lam copied to clipboard

Add dataset: odeuropa_benchmarks_and_corpora

Open davanstrien opened this issue 2 years ago • 1 comments

A URL for this dataset

https://github.com/Odeuropa/benchmarks_and_corpora

Dataset description

This dataset

contains the annotations related to olfactory information from the benchmark created for the ODEUROPA project. For 7 languages we selected a pool of documents covering different time periods (from 1620 to 1925) and topics (e.g. medicine, law, literature).

This offers an exciting dataset of annotations related to olfactory (smell) information in historical documents. The dataset is interesting because it covers a range of periods but also offers the possibility of utilising ml for a different task than standard entity recognition tasks.

Dataset modality

Text

Dataset licence

Other license

Other licence

No response

How can you access this data

As a download from a repository/website

Confirm the dataset has an open licence

  • [X] To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

No response

davanstrien avatar Jul 14 '22 13:07 davanstrien

I am clarifying the licence for this, see https://github.com/Odeuropa/benchmarks_and_corpora/issues/3 so would hold off working on this until we've got that info back.

davanstrien avatar Jul 14 '22 13:07 davanstrien