immich
immich copied to clipboard
feat: add OCR functionality and related configurations
Description
feat: add OCR functionality and related configurations
- Added new OCR-related translations in English and Simplified Chinese
- Integrated PaddleOCR for text recognition in images
- Enhanced search capabilities by adding an “OCR” search type
- Implemented backend support for OCR processing, including database schema updates and service integration
- Updated frontend components to support OCR settings and search filters, including mobile OCR search-filter functionality
How Has This Been Tested?
- [x] Tested on a fresh full deployment
- [x] Tested on an incremental / rolling upgrade
Screenshots (if appropriate)
https://github.com/user-attachments/assets/d61cd4ed-57fa-4518-abee-d0cb577c31a4
https://github.com/user-attachments/assets/4ed37925-a068-42be-abbf-cac0be2b97f7
Checklist:
- [x] I have performed a self-review of my own code
- [x] I have made corresponding changes to the documentation if applicable
- [x] I have no unrelated changes in the PR.
- [x] I have confirmed that any new dependencies are strictly necessary.
- [x] I have written tests for new code (if applicable)
- [x] I have followed naming conventions/patterns in the surrounding code
- [x] All code in
src/services/uses repositories implementations for database calls, filesystem operations, etc. - [x] All code in
src/repositories/is pretty basic/simple and does not have any immich specific logic (that belongs insrc/services/)
I rebased and made some changes:
- I switched the library to RapidOCR, which means it uses ONNX Runtime and hence has wider HWA compatibility
- The response is now sent as columnar arrays since the number of results can be quite high
- The detection and recognition scores are both stored in the DB
- The box positions are normalized so they're independent of the resolution at which the image was processed
- Changed the settings options to have a detection threshold, recognition threshold and max resolution, which I think are the main parameters an admin would care about
- Indexed the search table with a trigram index
- Removed the ocr endpoint in favor of adding an
ocrfield to the existing endpoints
To do is to update the mobile app, add tests and do more testing. The defaults may also need to be tweaked - it is fairly slow with these settings.
Hi on this fork, when using Nvidia CUDA on a 12GB vram, I noticed after a while, the gpu run out of memory:
immich_machine_learning | ONNXRuntimeError: Traceback (most recent call immich_machine_learning | last): immich_machine_learning | File immich_machine_learning | "/opt/venv/lib/python3.11/site-packages/rapidocr/in immich_machine_learning | ference_engine/onnxruntime/main.py", line 90, in immich_machine_learning | call immich_machine_learning | return immich_machine_learning | self.session.run(self.get_output_names(), immich_machine_learning | input_dict)[0] immich_machine_learning | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ immich_machine_learning | ^^^^^^^^^^^^^ immich_machine_learning | File immich_machine_learning | "/opt/venv/lib/python3.11/site-packages/onnxruntime immich_machine_learning | /capi/onnxruntime_inference_collection.py", line immich_machine_learning | 220, in run immich_machine_learning | return self._sess.run(output_names, input_feed, immich_machine_learning | run_options) immich_machine_learning | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ immich_machine_learning | ^^^^^^^^^^^^^ immich_machine_learning | onnxruntime.capi.onnxruntime_pybind11_state.Runtime immich_machine_learning | Exception: [ONNXRuntimeError] : 6 : immich_machine_learning | RUNTIME_EXCEPTION : Non-zero status code returned immich_machine_learning | while running Resize node. Name:'Resize.6' Status immich_machine_learning | Message: immich_machine_learning | /onnxruntime_src/onnxruntime/core/framework/bfc_are immich_machine_learning | na.cc:376 void* immich_machine_learning | onnxruntime::BFCArena::AllocateRawInternal(size_t, immich_machine_learning | bool, onnxruntime::Stream*, bool, immich_machine_learning | onnxruntime::WaitNotificationFn) Failed to allocate immich_machine_learning | memory for requested buffer of size 802160640
I wonder if there is a known memory leak issues or too aggressive caching on RapidOCR side?
Hi on this fork, when using Nvidia CUDA on a 12GB vram, I noticed after a while, the gpu run out of memory:
immich_machine_learning | ONNXRuntimeError: Traceback (most recent call
immich_machine_learning | last):
immich_machine_learning | File
immich_machine_learning | "/opt/venv/lib/python3.11/site-packages/rapidocr/in
immich_machine_learning | ference_engine/onnxruntime/main.py", line 90, in
immich_machine_learning | call
immich_machine_learning | return
immich_machine_learning | self.session.run(self.get_output_names(),
immich_machine_learning | input_dict)[0]
immich_machine_learning | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
immich_machine_learning | ^^^^^^^^^^^^^
immich_machine_learning | File
immich_machine_learning | "/opt/venv/lib/python3.11/site-packages/onnxruntime
immich_machine_learning | /capi/onnxruntime_inference_collection.py", line
immich_machine_learning | 220, in run
immich_machine_learning | return self._sess.run(output_names, input_feed,
immich_machine_learning | run_options)
immich_machine_learning | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
immich_machine_learning | ^^^^^^^^^^^^^
immich_machine_learning | onnxruntime.capi.onnxruntime_pybind11_state.Runtime
immich_machine_learning | Exception: [ONNXRuntimeError] : 6 :
immich_machine_learning | RUNTIME_EXCEPTION : Non-zero status code returned
immich_machine_learning | while running Resize node. Name:'Resize.6' Status
immich_machine_learning | Message:
immich_machine_learning | /onnxruntime_src/onnxruntime/core/framework/bfc_are
immich_machine_learning | na.cc:376 void*
immich_machine_learning | onnxruntime::BFCArena::AllocateRawInternal(size_t,
immich_machine_learning | bool, onnxruntime::Stream*, bool,
immich_machine_learning | onnxruntime::WaitNotificationFn) Failed to allocate
immich_machine_learning | memory for requested buffer of size 802160640
I wonder if there is a known memory leak issues or too aggressive caching on RapidOCR side?
Have you changed any of the OCR settings or concurrency from default?
Have you changed any of the OCR settings or concurrency from default?
Hi, not much. It was default, yes my setup has multiple GPUs, but the issue persists irrespective of OCR concurrency setting. However I did find a solution although I don't have an explanation. The OOM issue disappear when this is set in environment variable:
MACHINE_LEARNING_WORKERS=1 MACHINE_LEARNING_REQUEST_THREADS=1
This setting despite only using 1 gpu, it's much faster than cpu, so it's what I used now.
Edit: utilizing reverse proxy at https://github.com/apetersson/immich_ml_balancer and keeping the same principle of ensuring each gpu only run a single request thread (above params and pinning specific GPU for each immich ml service), all gpus can now be utilized without out of memory issue.