kleineanfragen icon indicating copy to clipboard operation
kleineanfragen copied to clipboard

Rotate pages before extraction

Open robbi5 opened this issue 9 years ago • 1 comments

Some papers have broken/unsearchable text, because some pages should have been rotated before extracting.

Example: https://kleineanfragen.de/schleswig-holstein/18/406 Extracted Text: https://kleineanfragen.de/schleswig-holstein/18/406-gremienmitgliedschaften-der-regierungsmitglieder-und-staatssekretaere.txt

Fr
ag

e
n

 1
,3

 u
n

d
 4

:  
G

re
m

ie
n

 im
 S

in
n

e 
d

robbi5 avatar Dec 23 '15 10:12 robbi5

Apache TIKA Bug: https://issues.apache.org/jira/browse/TIKA-723 "Rotated text isn't extracted correctly from PDFs"

robbi5 avatar Dec 26 '15 14:12 robbi5