surya icon indicating copy to clipboard operation
surya copied to clipboard

GPU outmemery

Open newsyh opened this issue 1 year ago • 2 comments

when run by GPU,errinfo: Loaded detection model vikp/surya_det3 on device cuda with dtype torch.float16 Loaded recognition model vikp/surya_rec2 on device cuda with dtype torch.float16 Detecting bboxes: 0%| | 0/6 [00:04<?, ?it/s] Traceback (most recent call last): File "/root/miniconda3/envs/surya/bin/surya_ocr", line 8, in sys.exit(main()) ^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/ocr_text.py", line 63, in main predictions_by_image = run_ocr(images, image_langs, det_model, det_processor, rec_model, rec_processor) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/surya/ocr.py", line 65, in run_ocr det_predictions = batch_text_detection(images, det_model, det_processor) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/surya/detection.py", line 128, in batch_text_detection preds, orig_sizes = batch_detection(images, model, processor, batch_size=batch_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/surya/detection.py", line 74, in batch_detection pred = model(pixel_values=batch) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/surya/model/detection/model.py", line 758, in forward logits = self.decode_head(encoder_hidden_states) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/surya/model/detection/model.py", line 729, in forward hidden_states = self.linear_fuse(torch.cat(all_hidden_states[::-1], dim=1)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 458, in forward return self._conv_forward(input, self.weight, self.bias) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/surya/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward return F.conv2d(input, weight, bias, self.stride, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.09 GiB. GPU 0 has a total capacity of 9.77 GiB of which 460.81 MiB is free. Including non-PyTorch memory, this process has 9.31 GiB memory in use. Of the allocated memory 8.15 GiB is allocated by PyTorch, and 9 16.32 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#envi ronment-variables)

but run nvidia-smi:Mon Oct 14 02:30:44 2024
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3080 Off | 00000000:4B:00.0 Off | N/A | | 42% 43C P0 87W / 320W | 1MiB / 10240MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA GeForce RTX 3080 Off | 00000000:B1:00.0 Off | N/A | | 42% 36C P0 83W / 320W | 1MiB / 10240MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found |

newsyh avatar Oct 14 '24 02:10 newsyh

Resize the parameter about *_BATCH_SIZE

yangmaozhe avatar Oct 29 '24 06:10 yangmaozhe

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.03 GiB. GPU 0 has a total capacity of 3.80 GiB of which 938.56 MiB is free. Process 3323 has 6.15 MiB memory in use. Including non-PyTorch memory, this process has 2.87 GiB memory in use. Of the allocated memory 2.77 GiB is allocated by PyTorch, and 6.00 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I am getting this issue. anybody can help on that how to resolve this

edgelearningcentre avatar Sep 21 '25 06:09 edgelearningcentre