heliotrope
heliotrope copied to clipboard
Investigate/build support for separate text file
As a publisher, I may provide a separate text file that is more accessible than the original file. E.g. I provide PDFs that are made up of page scans and bundled into a PDF that cannot be made accessible. The generated OCR derived from the PDF is not accessible.
I need to be able to upload a separate, cleaned up/rekeyed and accessible text file to the platform.
-
[ ] This file would be associated with the original PDF file (not replacing it).
-
[ ] This file would be downloadable.
Questions
-
[ ] Need to determine if the download button should say something different than Download OCR Text.
-
[ ] Need to determine if the disclaimer message would appear (see ticket #1429).
-
[ ] Is it possible to include this text in the PDF that could then be part of the auto-generated OCR text?
For Turner, we need to get the rekeyed or better OCR-generated (through Prime OCR) file.
We are waiting to get the rekeyed text file. The ability to replace the Extracted File text file should be ready(ish).