immich icon indicating copy to clipboard operation
immich copied to clipboard

feat: add OCR functionality and related configurations

Open flipped-1121 opened this issue 6 months ago • 1 comments

Description

feat: add OCR functionality and related configurations

  • Added new OCR-related translations in English and Simplified Chinese
  • Integrated PaddleOCR for text recognition in images
  • Enhanced search capabilities by adding an “OCR” search type
  • Implemented backend support for OCR processing, including database schema updates and service integration
  • Updated frontend components to support OCR settings and search filters, including mobile OCR search-filter functionality

How Has This Been Tested?

  • [x] Tested on a fresh full deployment
  • [x] Tested on an incremental / rolling upgrade

Screenshots (if appropriate)

image

https://github.com/user-attachments/assets/d61cd4ed-57fa-4518-abee-d0cb577c31a4

https://github.com/user-attachments/assets/4ed37925-a068-42be-abbf-cac0be2b97f7

Checklist:

  • [x] I have performed a self-review of my own code
  • [x] I have made corresponding changes to the documentation if applicable
  • [x] I have no unrelated changes in the PR.
  • [x] I have confirmed that any new dependencies are strictly necessary.
  • [x] I have written tests for new code (if applicable)
  • [x] I have followed naming conventions/patterns in the surrounding code
  • [x] All code in src/services/ uses repositories implementations for database calls, filesystem operations, etc.
  • [x] All code in src/repositories/ is pretty basic/simple and does not have any immich specific logic (that belongs in src/services/)

flipped-1121 avatar Jun 01 '25 14:06 flipped-1121

I rebased and made some changes:

  • I switched the library to RapidOCR, which means it uses ONNX Runtime and hence has wider HWA compatibility
  • The response is now sent as columnar arrays since the number of results can be quite high
  • The detection and recognition scores are both stored in the DB
  • The box positions are normalized so they're independent of the resolution at which the image was processed
  • Changed the settings options to have a detection threshold, recognition threshold and max resolution, which I think are the main parameters an admin would care about
  • Indexed the search table with a trigram index
  • Removed the ocr endpoint in favor of adding an ocr field to the existing endpoints

To do is to update the mobile app, add tests and do more testing. The defaults may also need to be tweaked - it is fairly slow with these settings.

mertalev avatar Jun 16 '25 18:06 mertalev

alextran1502 avatar Oct 15 '25 21:10 alextran1502

Hi on this fork, when using Nvidia CUDA on a 12GB vram, I noticed after a while, the gpu run out of memory:

immich_machine_learning | ONNXRuntimeError: Traceback (most recent call immich_machine_learning | last): immich_machine_learning | File immich_machine_learning | "/opt/venv/lib/python3.11/site-packages/rapidocr/in immich_machine_learning | ference_engine/onnxruntime/main.py", line 90, in immich_machine_learning | call immich_machine_learning | return immich_machine_learning | self.session.run(self.get_output_names(), immich_machine_learning | input_dict)[0] immich_machine_learning | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ immich_machine_learning | ^^^^^^^^^^^^^ immich_machine_learning | File immich_machine_learning | "/opt/venv/lib/python3.11/site-packages/onnxruntime immich_machine_learning | /capi/onnxruntime_inference_collection.py", line immich_machine_learning | 220, in run immich_machine_learning | return self._sess.run(output_names, input_feed, immich_machine_learning | run_options) immich_machine_learning | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ immich_machine_learning | ^^^^^^^^^^^^^ immich_machine_learning | onnxruntime.capi.onnxruntime_pybind11_state.Runtime immich_machine_learning | Exception: [ONNXRuntimeError] : 6 : immich_machine_learning | RUNTIME_EXCEPTION : Non-zero status code returned immich_machine_learning | while running Resize node. Name:'Resize.6' Status immich_machine_learning | Message: immich_machine_learning | /onnxruntime_src/onnxruntime/core/framework/bfc_are immich_machine_learning | na.cc:376 void* immich_machine_learning | onnxruntime::BFCArena::AllocateRawInternal(size_t, immich_machine_learning | bool, onnxruntime::Stream*, bool, immich_machine_learning | onnxruntime::WaitNotificationFn) Failed to allocate immich_machine_learning | memory for requested buffer of size 802160640

I wonder if there is a known memory leak issues or too aggressive caching on RapidOCR side?

dodysw3 avatar Oct 24 '25 02:10 dodysw3

Hi on this fork, when using Nvidia CUDA on a 12GB vram, I noticed after a while, the gpu run out of memory:

immich_machine_learning | ONNXRuntimeError: Traceback (most recent call

immich_machine_learning | last):

immich_machine_learning | File

immich_machine_learning | "/opt/venv/lib/python3.11/site-packages/rapidocr/in

immich_machine_learning | ference_engine/onnxruntime/main.py", line 90, in

immich_machine_learning | call

immich_machine_learning | return

immich_machine_learning | self.session.run(self.get_output_names(),

immich_machine_learning | input_dict)[0]

immich_machine_learning | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

immich_machine_learning | ^^^^^^^^^^^^^

immich_machine_learning | File

immich_machine_learning | "/opt/venv/lib/python3.11/site-packages/onnxruntime

immich_machine_learning | /capi/onnxruntime_inference_collection.py", line

immich_machine_learning | 220, in run

immich_machine_learning | return self._sess.run(output_names, input_feed,

immich_machine_learning | run_options)

immich_machine_learning | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

immich_machine_learning | ^^^^^^^^^^^^^

immich_machine_learning | onnxruntime.capi.onnxruntime_pybind11_state.Runtime

immich_machine_learning | Exception: [ONNXRuntimeError] : 6 :

immich_machine_learning | RUNTIME_EXCEPTION : Non-zero status code returned

immich_machine_learning | while running Resize node. Name:'Resize.6' Status

immich_machine_learning | Message:

immich_machine_learning | /onnxruntime_src/onnxruntime/core/framework/bfc_are

immich_machine_learning | na.cc:376 void*

immich_machine_learning | onnxruntime::BFCArena::AllocateRawInternal(size_t,

immich_machine_learning | bool, onnxruntime::Stream*, bool,

immich_machine_learning | onnxruntime::WaitNotificationFn) Failed to allocate

immich_machine_learning | memory for requested buffer of size 802160640

I wonder if there is a known memory leak issues or too aggressive caching on RapidOCR side?

Have you changed any of the OCR settings or concurrency from default?

mertalev avatar Oct 24 '25 15:10 mertalev

Have you changed any of the OCR settings or concurrency from default?

Hi, not much. It was default, yes my setup has multiple GPUs, but the issue persists irrespective of OCR concurrency setting. However I did find a solution although I don't have an explanation. The OOM issue disappear when this is set in environment variable:

MACHINE_LEARNING_WORKERS=1 MACHINE_LEARNING_REQUEST_THREADS=1

This setting despite only using 1 gpu, it's much faster than cpu, so it's what I used now.

Edit: utilizing reverse proxy at https://github.com/apetersson/immich_ml_balancer and keeping the same principle of ensuring each gpu only run a single request thread (above params and pinning specific GPU for each immich ml service), all gpus can now be utilized without out of memory issue.

dodysw3 avatar Oct 25 '25 09:10 dodysw3