lam icon indicating copy to clipboard operation
lam copied to clipboard

Add dataset: [bnl_newspapers1841-1879]

Open ymaurer opened this issue 2 years ago • 1 comments

A URL for this dataset

https://data.bnl.lu/data/historical-newspapers/

Dataset description

630.709 articles from historical newspapers (1841-1879) along with metadata and the full text.

21 newspaper titles 24.415 newspaper issues 99.957 scanned pages Transcribed using a variety of OCR engines and corrected using https://github.com/natliblux/nautilusocr (95% threshold)

The newspapers used are:

  • Der Arbeiter (1878)
  • L'Arlequin (1848-1848)
  • L'Avenir (1868-1871)
  • Courrier du Grand-Duché de Luxembourg (1844-1868)
  • Cäcilia (1863-1871)
  • Diekircher Wochenblatt (1841-1848)
  • Le Gratis luxembourgeois (1857-1858)
  • L'Indépendance luxembourgeoise (1871-1879)
  • Kirchlicher Anzeiger für die Diözese Luxemburg (1871-1879)
  • La Gazette du Grand-Duché de Luxembourg (1878)
  • Luxemburger Anzeiger (1856)
  • Luxemburger Bauernzeitung (1857)
  • Luxemburger Volks-Freund (1869-1876)
  • Luxemburger Wort (1848-1879)
  • Luxemburger Zeitung (1844-1845)
  • Luxemburger Zeitung = Journal de Luxembourg (1858-1859)
  • L'Union (1860-1871)
  • Das Vaterland (1869-1870)
  • Der Volksfreund (1848-1849)
  • Der Wächter an der Sauer (1849-1869)
  • D'Wäschfra (1868-1879)

Dataset modality

Text

Dataset licence

Creative Commons Public Domain Dedication and Certification

Other licence

No response

How can you access this data

As a download from a repository/website

size of dataset

500MB-2GB

Confirm the dataset has an open licence

  • [X] To the best of my knowledge, this dataset is accessible via an open licence

Contact details for data custodian

[email protected]

ymaurer avatar Nov 14 '22 11:11 ymaurer

I'll add it to: https://huggingface.co/datasets/biglam/bnl_newspapers1841-1879

ymaurer avatar Nov 14 '22 11:11 ymaurer