immich feat: add OCR functionality and related configurations

Description

feat: add OCR functionality and related configurations

Added new OCR-related translations in English and Simplified Chinese
Integrated PaddleOCR for text recognition in images
Enhanced search capabilities by adding an “OCR” search type
Implemented backend support for OCR processing, including database schema updates and service integration
Updated frontend components to support OCR settings and search filters, including mobile OCR search-filter functionality

How Has This Been Tested?

[x] Tested on a fresh full deployment
[x] Tested on an incremental / rolling upgrade

Screenshots (if appropriate)

https://github.com/user-attachments/assets/d61cd4ed-57fa-4518-abee-d0cb577c31a4

https://github.com/user-attachments/assets/4ed37925-a068-42be-abbf-cac0be2b97f7

Checklist:

[x] I have performed a self-review of my own code
[x] I have made corresponding changes to the documentation if applicable
[x] I have no unrelated changes in the PR.
[x] I have confirmed that any new dependencies are strictly necessary.
[x] I have written tests for new code (if applicable)
[x] I have followed naming conventions/patterns in the surrounding code
[x] All code in src/services/ uses repositories implementations for database calls, filesystem operations, etc.
[x] All code in src/repositories/ is pretty basic/simple and does not have any immich specific logic (that belongs in src/services/)

Jun 01 '25 14:06 flipped-1121

I rebased and made some changes:

I switched the library to RapidOCR, which means it uses ONNX Runtime and hence has wider HWA compatibility
The response is now sent as columnar arrays since the number of results can be quite high
The detection and recognition scores are both stored in the DB
The box positions are normalized so they're independent of the resolution at which the image was processed
Changed the settings options to have a detection threshold, recognition threshold and max resolution, which I think are the main parameters an admin would care about
Indexed the search table with a trigram index
Removed the ocr endpoint in favor of adding an ocr field to the existing endpoints

To do is to update the mobile app, add tests and do more testing. The defaults may also need to be tweaked - it is fairly slow with these settings.

Jun 16 '25 18:06 mertalev

Oct 15 '25 21:10 alextran1502

Hi on this fork, when using Nvidia CUDA on a 12GB vram, I noticed after a while, the gpu run out of memory:

immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | immich_machine_learning | ONNXRuntimeError: Traceback (most recent call last): File "/opt/venv/lib/python3.11/site-packages/rapidocr/in ference_engine/onnxruntime/main.py", line 90, in call return self.session.run(self.get_output_names(), input_dict)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^ File "/opt/venv/lib/python3.11/site-packages/onnxruntime /capi/onnxruntime_inference_collection.py", line 220, in run return self._sess.run(output_names, input_feed, run_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^ onnxruntime.capi.onnxruntime_pybind11_state.Runtime Exception: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Resize node. Name:'Resize.6' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_are na.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 802160640

I wonder if there is a known memory leak issues or too aggressive caching on RapidOCR side?

Oct 24 '25 02:10 dodysw3

Hi on this fork, when using Nvidia CUDA on a 12GB vram, I noticed after a while, the gpu run out of memory:

immich_machine_learning | ONNXRuntimeError: Traceback (most recent call

immich_machine_learning | last):

immich_machine_learning | File

immich_machine_learning | "/opt/venv/lib/python3.11/site-packages/rapidocr/in

immich_machine_learning | ference_engine/onnxruntime/main.py", line 90, in

immich_machine_learning | call

immich_machine_learning | return

immich_machine_learning | self.session.run(self.get_output_names(),

immich_machine_learning | input_dict)[0]

immich_machine_learning | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

immich_machine_learning | ^^^^^^^^^^^^^

immich_machine_learning | File

immich_machine_learning | "/opt/venv/lib/python3.11/site-packages/onnxruntime

immich_machine_learning | /capi/onnxruntime_inference_collection.py", line

immich_machine_learning | 220, in run

immich_machine_learning | return self._sess.run(output_names, input_feed,

immich_machine_learning | run_options)

immich_machine_learning | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

immich_machine_learning | ^^^^^^^^^^^^^

immich_machine_learning | onnxruntime.capi.onnxruntime_pybind11_state.Runtime

immich_machine_learning | Exception: [ONNXRuntimeError] : 6 :

immich_machine_learning | RUNTIME_EXCEPTION : Non-zero status code returned

immich_machine_learning | while running Resize node. Name:'Resize.6' Status

immich_machine_learning | Message:

immich_machine_learning | /onnxruntime_src/onnxruntime/core/framework/bfc_are

immich_machine_learning | na.cc:376 void*

immich_machine_learning | onnxruntime::BFCArena::AllocateRawInternal(size_t,

immich_machine_learning | bool, onnxruntime::Stream*, bool,

immich_machine_learning | onnxruntime::WaitNotificationFn) Failed to allocate

immich_machine_learning | memory for requested buffer of size 802160640

I wonder if there is a known memory leak issues or too aggressive caching on RapidOCR side?

Have you changed any of the OCR settings or concurrency from default?

Oct 24 '25 15:10 mertalev

Have you changed any of the OCR settings or concurrency from default?

Hi, not much. It was default, yes my setup has multiple GPUs, but the issue persists irrespective of OCR concurrency setting. However I did find a solution although I don't have an explanation. The OOM issue disappear when this is set in environment variable:

MACHINE_LEARNING_WORKERS=1 MACHINE_LEARNING_REQUEST_THREADS=1

This setting despite only using 1 gpu, it's much faster than cpu, so it's what I used now.

Edit: utilizing reverse proxy at https://github.com/apetersson/immich_ml_balancer and keeping the same principle of ensuring each gpu only run a single request thread (above params and pinning specific GPU for each immich ml service), all gpus can now be utilized without out of memory issue.

Oct 25 '25 09:10 dodysw3

immich immich copied to clipboard

feat: add OCR functionality and related configurations

Description

How Has This Been Tested?

Screenshots (if appropriate)

Checklist:

immich
immich copied to clipboard