workflow_ocr
workflow_ocr copied to clipboard
OCR workflow should maintain modification date of original file
Describe the bug
we plan to load historical pdf files into the database and want to make them searchable using OCR workflow, which changes the modification date of the file - hence the important historical context of the modification date is "lost", limiting the usability of this great feature.
The ocrmypdf maintainer confirms, that ocrmypdf must change the modification date to comply to the standard.
For the OCR workflow I see 2 options:
- optionally restore the original modification date after adding the OCR layer.
- add the original modification date to the file name. Options -- prepend using a format which allows sorting of the files like "yyyymmdd" -- append
I have created a little python script which prepends the original modification date to all pdf files if no date is found at the beginning of the file to overcome this situation, but want to clarify the situation before I proceed.
System
- App version: 1.29
- Nextcloud version: 29.0.3
How to reproduce
Steps to reproduce the behavior: trigger the OCR Workflow