Current implementation of OCR have no method to free RAM, as result server sometimes down due to RAM out, especially when server spawn multiple workers.

Use case

I use EasyOCR + FastAPI + Gunicorn with multiple workers. Server creates one instance of of EasyOCR for every language direction and keep it in RAM for fast access.

When one worker takes requests for ~12 different languages, it spawn N instances of EasyOCR and eventually fall with "not enough RAM" error.

It would be nice to have some method like close/stop/dispose to stop instance and free RAM.

Also this problem occurs on preloading stage when server creates EasyOCR instances one by one with every supported language, to ensure all models are downloaded and will be available when server will be started.

We have a lot of initialized instances of EasyOCR with different models, and it keeps in RAM forever until preloading script in run

Mar 09 '25 14:03 vitonsky

Why not have a pool of clients of most used languages ready from start, one client per language and then maybe use the readtext_batched to process one language requirements all with one client? Spawning new client each time anyway seems like a design issue.

Anyway I think this could be a good feature for the library and im willing to take it up, thanks for suggesting!

Jun 24 '25 18:06 brownsloth

I have experimented with this in Google Colab with a T4 GPU and here are my results. Here are the GPU VRAM stats.

Before loading EasyOCR:
Used: 265.00 MB / Total: 15360.00 MB

After loading easyocr reader and before inference:
Used: 515.88 MB / Total: 15360.00 MB

After inference of a 200x60 RGB image:
Used: 549.88 MB / Total: 15360.00 MB

Next up is things I tried to offload the VRAM:

import gc
reader.recognizer.module.to("cpu")
reader.detector.module.to("cpu")
time.sleep(2)
print_gpu_memory("After offloading models to cpu:")
del reader.recognizer
del reader.detector
del reader
time.sleep(2)
print_gpu_memory("After deleting model instances and reader:")
with torch.no_grad():
  torch.cuda.empty_cache()
time.sleep(2)
print_gpu_memory("After empty cuda cache:")
gc.collect()
time.sleep(2)
print_gpu_memory("After gc collect:")

After offloading models to cpu:
Used: 549.88 MB / Total: 15360.00 MB

After deleting model instances and reader:
Used: 549.88 MB / Total: 15360.00 MB

After empty cuda cache:
Used: 415.88 MB / Total: 15360.00 MB

After gc collect:
Used: 415.88 MB / Total: 15360.00 MB

Conclusion:

Basically the only thing that has worked is torch.cuda.empty_cache()

Could be that this amount (415MB) will reset only once the Colab kernel is reset. Or maybe easyocr is loading some tensors internally onto the gpu but recognizing those will be hard without going into details of the code.

Will try more things if I get some time later.

Jul 01 '25 19:07 brownsloth

Add method to unload models and free RAM

Use case

Conclusion: