cosypose
cosypose copied to clipboard
RunTimeError: CUDA out of memory // Requirements on Graphic card?
Hi,
while executing
python -m cosypose.scripts.run_cosypose_eval --config tless-siso
I receive the following error message:
RuntimeError: CUDA out of memory. Tried to allocate 1.35 GiB (GPU 0; 5.93 GiB total capacity; 1.47 GiB already allocated; 866.50 MiB free; 36.31 MiB cached)
According to my internet research a reduction of the batch size is recommended. However, I don't know where to set it and in my understanding the batch size shouldn't play any role in this command, since I use the already pre-trained network?!
Could the cause of the error be that there are certain hardware requirements for reproducing the results? I am using Ubuntu 18.04.5 LTS and an NVIDIA GeForce GTX 1060 6GB (and the nvidia-driver-450).
Here is a larger part of my terminal output:
1:06:35.398140 - Scene: [6]
1:06:35.398203 - Views: [359]
1:06:35.398260 - Group: [2732]
1:06:35.398285 - Image has 5 gt detections. (not used)
1:06:35.701966 - Pose prediction on 4 detections (n_iterations=1): 0:00:00.063503
1:06:35.954221 - Pose prediction on 4 detections (n_iterations=4): 0:00:00.250793
1:06:35.720832 - --------------------------------------------------------------------------------
100%|███████████████████████████████████████████████████████████| 10080/10080 [1:06:24<00:00, 2.53it/s]
1:06:47.763242 - Done with predictions
100%|█████████████████████████████████████████████████████████████| 10080/10080 [39:28<00:00, 4.26it/s]
1:46:18.765271 - Skipped: pix2pose_detections/coarse/iteration=1 (N=50023)
1:46:18.765351 - Skipped: pix2pose_detections/refiner/iteration=1 (N=50023)
1:46:18.765377 - Skipped: pix2pose_detections/refiner/iteration=2 (N=50023)
1:46:18.765398 - Skipped: pix2pose_detections/refiner/iteration=3 (N=50023)
1:46:18.765419 - Evaluation : pix2pose_detections/refiner/iteration=4 (N=50023)
0%| | 0/10080 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/rosmatch/anaconda3/envs/cosypose/lib/python3.7/runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "/home/rosmatch/anaconda3/envs/cosypose/lib/python3.7/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/home/rosmatch/cosypose/cosypose/scripts/run_cosypose_eval.py", line 491, in
Hello, i have the same issue, did u fix it ? thanks a lot for your answer
Same here
This may be done by changing the batch_size in run_pose_training.py.
I solved this with following changes: bullet_batch_renderer.py -> workers 8 to 1 multiview_predictor.py -> batch size(nsym) 64 to 1 run_bop_inference.py -> workers 8 to 1
Same here. Is there any other suggestion? Unfortunately, Johannes's solution didnt work for me. @JohannesAma did it really work for you for the siso tless case?
Same here. Is there any other suggestion? Unfortunately, Johannes's solution didnt work for me. @JohannesAma did it really work for you for the siso tless case?
My nvidia card has 8gb of storage, maybe yours is smaller and you have to reduce batch size and workers in some more modules that are used in the siso tless case.
I have the same problem and the suggested solution didnt work. Is there any solution ??? thanks in advance
Im sorry I dont know about another solution Workers and batchsize are the parameters which define the load on the grafics card Maybe you have to set them even smaller.
The main reason for this problem is that the data set evaluated by the evaluation is too large, and the GPU memory for running the program is less than 8GB. The The root cause is this line of code: run_cosypose_eval.py Line 443 eval_metrics[preds_k], eval_dfs[preds_k] = eval_runner.evaluate(preds)
Possible Solution:
- Go to folder "local_data" to delete some data. Then perform pre-training, usually, the results will not be a problem, and then execute the process of evaluation again.
- Discard GPU usage. Transfer all data, models to the CPU (requires constant code debugging)
- Modify the model to use AMP. However, the workload is large, and it is easy to cause the entire program to be difficult to execute if you are not careful.
In fact, the process of performing evaluation is not just to verify whether the results are correct. This model can be used to evaluate other datasets, and if the results are correct, it is not good. The main modification part is LOCAL_DATA_DIR.