alfresco-simple-ocr
alfresco-simple-ocr copied to clipboard
Disabling page auto-rotations
Is there a way to still run Alfresco Simple OCR (w/ pdfsandwich) on each new document version (so text can continue to be found on the pages) yet kill the auto-rotation portion of the process for subsequent versions of the document after 1.0? The business scenarios here is to avoid manual page rotations (i.e. corrections to improper automatic orientation) from being recursively overridden by the automatic processing. Our thought process to resolve this issue is to consider writing programming logic to consider what the version of the document is in order to apply auto-rotations or not. In other words, apply automatic page rotations to the very first version 1.0, but don't so on any subsequent version edits when manually changes/corrections could have been made. Of course, this is dependent on whether we’re able to pass a command to Simple OCR and/or pdfsandwich to conditionally disable the auto-rotation portion of the process. Is this possible to do? If so, do you know the code or command we need to employ in order to achieve this?
Stepping back, just wondering if you’re heard of this problem before and any other approaches you know of that we may want to consider (instead of the idea described above) to overcome it.
Thank you!
Here's more background:
There are anomalies with some kinds of scanned documents being uploaded where automation logic is not able to determine the page rotation correctly. Auto-rotations is based on what the process finds on the page and how it believes text direction should flow. But, there are times when pages have text flowing in conflicting directions (i.e. some block of text goes one way, and other block of text goes a different way – not to mention times when text is handwriting and not computer-generated). So, when the auto-rotation ends up being incorrect for understandable reasons, the user will proceed by manually rotating the page and then saving changes before adding annotations (via another third-party tool). This results in a new document version in Alfresco, which next triggers Simple OCR / pdfsandwich to run once again against the new version. What happens next is that automatic process reverses the user’s manual correction and ends up auto-rotating the page back to the incorrect orientation. The next time a user views the document, they see the rotation incorrect again plus annotation layer that is no longer corresponding to the proper coordinates of the page. At this point, manually rotating the page in the UI document viewer results in the annotation being rotated incorrectly and often in an illegible manner. The problem is recursive in nature and any annotations added (as they often will be) end up making the problem that much worse.