kleineanfragen
kleineanfragen copied to clipboard
Rotate pages before extraction
Some papers have broken/unsearchable text, because some pages should have been rotated before extracting.
Example: https://kleineanfragen.de/schleswig-holstein/18/406 Extracted Text: https://kleineanfragen.de/schleswig-holstein/18/406-gremienmitgliedschaften-der-regierungsmitglieder-und-staatssekretaere.txt
Fr
ag
e
n
1
,3
u
n
d
4
:
G
re
m
ie
n
im
S
in
n
e
d
Apache TIKA Bug: https://issues.apache.org/jira/browse/TIKA-723 "Rotated text isn't extracted correctly from PDFs"