workflow_ocr icon indicating copy to clipboard operation
workflow_ocr copied to clipboard

OCR workflow should maintain modification date of original file

Open ferdiga opened this issue 7 months ago • 3 comments

Describe the bug

we plan to load historical pdf files into the database and want to make them searchable using OCR workflow, which changes the modification date of the file - hence the important historical context of the modification date is "lost", limiting the usability of this great feature.

The ocrmypdf maintainer confirms, that ocrmypdf must change the modification date to comply to the standard.

For the OCR workflow I see 2 options:

  • optionally restore the original modification date after adding the OCR layer.
  • add the original modification date to the file name. Options -- prepend using a format which allows sorting of the files like "yyyymmdd" -- append

I have created a little python script which prepends the original modification date to all pdf files if no date is found at the beginning of the file to overcome this situation, but want to clarify the situation before I proceed.

System

  • App version: 1.29
  • Nextcloud version: 29.0.3

How to reproduce

Steps to reproduce the behavior: trigger the OCR Workflow

ferdiga avatar Jul 22 '24 09:07 ferdiga