Solution for CUDA out of memory, trying to allocate 133792483MiB
Thank the authors for the great work. I have met this "trying to allocate several hundred TB GPU memory" before。
This is often because you compiled this lib on on type of GPU, and try to call this lib on another type. If these two type of GPUs do not have same architecture, the call will fail and request an amount of memory only a top cluster can provide.
The solution is: to uninstall your current lib, remove the wheels entirely (can directly remove the whole folder and clone again), set this:
export TORCH_CUDA_ARCH_LIST="8.0;8.6;8.9;9.0"
Choose the architectures according to your need. If your test machine has 8.0 and computation machine has 9.0, then write 8.0;9.0
The solution is provided by https://blog.csdn.net/m0_57143158/article/details/143103426?spm=1001.2014.3001.8078#comments_35737374 (Chinese readers can directly refer to this post)