EasyOCR icon indicating copy to clipboard operation
EasyOCR copied to clipboard

How to limit VRam GPU usage

Open Pabloferex opened this issue 1 year ago • 3 comments

Is there anyway to limit the GPU VRam usage?

Pabloferex avatar Oct 22 '24 12:10 Pabloferex

EasyOCR is based on PyTorch framework. If PyTorch has such an option, then it is possible. You can read PyTorch documentation and search for such an option.

romanvelichkin avatar Nov 14 '24 06:11 romanvelichkin

    import torch
    torch.cuda.set_per_process_memory_fraction(0.4, 0)

howff avatar Dec 09 '24 11:12 howff

What @howff said, but that might cause a "CUDA out of memory" earlier for your usecase.

If so:

  • If you are processing multiple images in parallel, you can reduce the batch size.
  • Since right now there is no GPU ram/cache being cleared once the processing is finished you can explicitly do something like this for the other processes after one easyocr instance gives you your result. Did a little experiment for that:
# Step 2: GPU memory utility
def print_gpu_memory(message=""):
    nvmlInit()
    handle = nvmlDeviceGetHandleByIndex(0)
    info = nvmlDeviceGetMemoryInfo(handle)
    print(f"{message}\nUsed: {info.used / 1024**2:.2f} MB / Total: {info.total / 1024**2:.2f} MB\n")

# Step 3: Load model + inference
print_gpu_memory("Before loading EasyOCR:")

Output: Before loading EasyOCR: Used: 265.00 MB / Total: 15360.00 MB

reader = easyocr.Reader(['en'], gpu=True)  # Load EasyOCR on GPU

# Generate a simple image with text
from PIL import ImageDraw, ImageFont

img = Image.new('RGB', (200, 60), color=(255, 255, 255))
d = ImageDraw.Draw(img)
d.text((10, 10), "Hello EasyOCR!", fill=(0, 0, 0))
img_np = np.array(img)

print_gpu_memory("After loading model and before inference:")

Used: 515.88 MB / Total: 15360.00 MB

results = reader.readtext(img_np)
print("OCR Results:", results)

print_gpu_memory("After inference:")

Used: 549.88 MB / Total: 15360.00 MB

import time
for i in range(10):
  time.sleep(2)
  print_gpu_memory("After inference:")

The memory doesnt reduce long time after the inference is finished. So I tried a few things to do that:

del reader
time.sleep(2)
print_gpu_memory("After removing reader:")
time.sleep(2)
gc.collect()
time.sleep(2)
print_gpu_memory("After gc collect:")
time.sleep(2)
torch.cuda.empty_cache()
time.sleep(2)
print_gpu_memory("After emptying cache:")
time.sleep(2)
torch.cuda.ipc_collect()
time.sleep(2)
print_gpu_memory("After gc collect of inter-process shared memory:")
time.sleep(2)

Output:

After removing reader:
Used: 549.88 MB / Total: 15360.00 MB

After gc collect:
Used: 549.88 MB / Total: 15360.00 MB

After emptying cache:
Used: 415.88 MB / Total: 15360.00 MB

After gc collect of inter-process shared memory:
Used: 415.88 MB / Total: 15360.00 MB

So torch.cuda.empty_cache() helps the most.

brownsloth avatar Jun 25 '25 04:06 brownsloth