OCRmyPDF icon indicating copy to clipboard operation
OCRmyPDF copied to clipboard

Rotating pages - provide a way to rotate pages without OCR, tesseract-timeout=0

Open cbwheadon opened this issue 3 years ago • 2 comments

Rotation is detected but the page isn't rotated I'd like to rotate certain pages prior to rotating, but although the incorrect rotation seems to be detected, no action is taken.

To Reproduce

ocrmypdf -d --tesseract-timeout=0 --optimize 0 --rotate-pages -v1 skew-rotate.pdf srv1.pdf

The output is:

2 with existing rotation ⇨, page is facing ⇧, confidence 0.00 - no change

Example file I have attached the input pdf. skew-rotate.pdf

Expected behavior While the skew is dealt with beautifully, page 2 is rotated, and I would love to be able to rotate and then unskew in one go! I have tried changing the confidence level, but with a confidence of 0.00 I don't think that will make any impact. Thank you so much for a great package and your time!

System

  • macOS
  • OCRmyPDF Version: 11.4.5
  • brew install

cbwheadon avatar May 13 '21 12:05 cbwheadon

With --tesseract-timeout 0, orientation detection does not run, because Tesseract does not have enough time to attempt it. There is currently no combination of settings that does rotation without OCR although it could be achieved with a simple plugin (see tests/plugins/tesseract_noop.py).

I suppose there may be a case for some sort of no OCR option. Maybe ocrmypdf --actually-dont-ocr. Hmm...

jbarlow83 avatar May 13 '21 19:05 jbarlow83

Ah, yes, thank you, if I turn on the OCR it does a great job of de-skewing and rotating.

cbwheadon avatar May 14 '21 08:05 cbwheadon