immich
immich copied to clipboard
[Feature]: OCR
Feature detail
Additionally to object detection it would be awesome to have the images ocr'ed to search for Text inside the images and added to the metadata.
Platform
Server
is this something that can be completed by a webhook into an eco-system of ML containers?
Ie, on upload, a webhook is triggered, which is registered by one or more individual ML containers to do their thing, OCR, face detection, object detection. Whatever is actually wanted/needed by the individual.
This is nice but out of scope of the project
I am using PaddleOCR to implement ocr and support retrieval on the app
Approaching this from a different angle: Google Photos android app saves locally1 a fairly complete (and GB-large2 for any sizeable number of assets) gphotos0.db which is a sqlite3 db with a lot of metadata for (all the) Google Photos assets from the account. There is a lot of data there, including of course the OCRed strings. If we had an endpoint, or a simple no matter how hackish workflow to ingest this into Immich it'll mean a lot for power users coming from Google Photos.
1 albeit you'd generally need root to grab it, or just some Android emulator with enough stuff on it so you can install Google Photos, log in and let it sync the db, and then open the local disk and access it some way
2 this is what you see as GBs taken by Google Photos even if you don't have anything locally, but many pictures online