media-experiments icon indicating copy to clipboard operation
media-experiments copied to clipboard

OCR

Open swissspidy opened this issue 2 years ago • 3 comments

PDF.js could be combined with https://tesseract.projectnaptha.com/ to do OCR on uploaded PDFs. Just needs a good use case.

Apparently the underlying Tesseract models haven't been updated in a while, so maybe need to find alternatives.

swissspidy avatar Mar 31 '24 09:03 swissspidy

PDF.js can actually extract text from PDFs already. So might be more useful for images.

swissspidy avatar Aug 20 '24 08:08 swissspidy

For images it could be interesting to extract text during upload and then store that as metadata. Useful for searching the media library.

swissspidy avatar Aug 20 '24 08:08 swissspidy

Related: #647

swissspidy avatar Sep 13 '24 13:09 swissspidy