Better OCR Overlay
NAPS is one of the fastest and most convenient tools I know for optimising and OCRing PDFs. However, often the OCRed text doesn't match the text in the underlying image.
This, on the other hand, works quite well with this tool: https://github.com/UB-Mannheim/zotero-ocr Maybe it makes sense to use the same approach? I don't know the technical details, it's just an idea.
Do you have an example PDF where the text doesn't match up?
It seems to happen when the document uses different fonts. In the following example, the text is correct, but the heading is not.
Edit: I've just noticed that the text doesn't overlap ideally either. There is always space between the words.