SpinNet
SpinNet copied to clipboard
Inference RuntimeError: CUDA out of memory
Hi,
when running preparation.py for 3DMatch I got the following error
RuntimeError: CUDA out of memory. Tried to allocate 5.15 GiB (GPU 0; 10.76 GiB total capacity; 6.27 GiB already allocated; 3.52 GiB free; 6.29 GiB reserved in total by PyTorch)
Is this a normal behavior? Since this is a provided demo, I would assume it should run without such issue on a GPU with 11GB memory.
Also, could you give a rough number on the runtime for inference, e.g., how long it needs to process 4096 keypoints?
Many thanks!
Hi, @rui2016, thanks for your interest in our work!
You can try to reduce the step_size in preparation.py appropriately according to the GPU memory usage.
https://github.com/QingyongHu/SpinNet/blob/5581e7d184bc3b4d525d5b5e58777ea04dfdc9ab/ThreeDMatch/Test/preparation.py#L97
For the current version, it may take more than ten seconds to process 5000 keypoints at a time, and the whole inference may take three or four hours.
Best, Sheng
Hi @aosheng1996,
thanks for the reply. Yes, I was able to run it by decreasing step_size to 40. Nevertheless, I still have the impression that the current implementation might not be very efficient, both memory and runtime. Using the GPU mentioned above (11GB), extracting descriptors for 8192 keypoints takes ~35s (with step_size=40). The descriptiveness of the descriptors looks promising, though.
Looking forward to an improved version if that is in your plan.
Cheers!
Hi @aosheng1996,
thanks for the reply. Yes, I was able to run it by decreasing step_size to 40. Nevertheless, I still have the impression that the current implementation might not be very efficient, both memory and runtime. Using the GPU mentioned above (11GB), extracting descriptors for 8192 keypoints takes ~35s (with step_size=40). The descriptiveness of the descriptors looks promising, though.
Looking forward to an improved version if that is in your plan.
Cheers!
As far as I can see, SpinNet requires much more runtime to process and extract descriptors, take 3DMatch benchmark as example, extracting 5k + 5k = 10k descriptors for 3DMatch comsumes about ~75s, while using classical FPFH descriptor, it takes almost <1s.