redner
redner copied to clipboard
GPU illegal memory access
Hi! I followed your instructions and successfully built redner from source on Ubuntu 16.0LTS with python3.7 and CUDA10.1. But I got problems with GPU version.
When I run programs in "/redner/tutorials", such as "01_optimize_single_triangle.py" and "02_pose_estimation.py", they will throw out error CUDA Runtime Error: an illegal memory access was encountered at /home/ubuntu/redner/buffer.h:86
. When I set the line pyredner.set_use_gpu(torch.cuda.is_available())
to pyredner.set_use_gpu(False)
, that is to use CPU version manully, they works fine. Is there anything wrong with my settings? Or what step I need to take to address this problem? Thanks a lot!
Here is the output of nvidia-smi
and nvcc --version
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 36C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
nvcc: NVIDIA (R) Cuda compiler driver Cuda compilation tools, release 10.0, V10.0.130
Do the same issue happen if you pip install redner-gpu
? Maybe also check out the Dockerfile.
If you can create an environment (such as a Dockerfile) for me to reproduce this it would be a lot easier for me to investigate.
I've tried both pip install redner-gpu
and pip install redner-gpu
command at first, but they returned the same error:
ERROR: Could not find a version that satisfies the requirement redner-gpu (from versions: none)
ERROR: No matching distribution found for redner-gpu
So I installed it manually.
By the way, pip -V
is pip 19.3.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)
and python -V
is Python 3.7.4
I'll build a docker image to help you to diagnoise the problem, thank you!
Seems that your pip is pointing to a different Python version compared to your main python. Try python -m pip install redner-gpu
.
Thanks for your suggestion! I've successfully installed it by python -m pip install redner-gpu
.
But unfortunately, the same error remains (CUDA Runtime Error: an illegal memory access was encountered at /tmp/pip-req-build-m4k01szw/buffer.h:86
).
Hmm. I'll wait for your Docker image then. You can take a look at manylinux-gpu.Dockerfile to see how I set those up.
By the way, your nvcc version (10.0) doesn't match the CUDA version (10.1) in your nvidia-smi prompt. Not sure if this matters though since pip install didn't fix it.
Could be related to https://github.com/BachiLi/redner/issues/38
I only tested redner on GPUs with compute capability >= 6.0, and Optix prime could behave differently on an older card. I thought I have fixed it by adding optix_scene->finish();
, but it could be that there are some other undocumented behaviors of optix prime that can cause a similar issue.
One way to check this is to add a cuda_synchronize()
after optix_scene->finish();
. Let me know if this fixes your issue or not.
Thanks for the information! I've tried to add the line but it stills has the same error :( I've built a docker image and what way you prefer to share it with you?
Upload your dockerfile. Or maybe upload the image to google drive or dropbox.
Hi I've switched the GPU to Tesla V100 with nvcc 10.0, CUDA 10.1, python 3.6, pip3 3.6, PyTorch 1.3.1 and used the command pip3 install redner-gpu
and the program run successfully! It seems to be a version incompatibility issue. Thanks for you prompt help!
Most likely there is something I don't know happening with older GPUs. I'll test it on my own if I manage to get a K80.
I had the same issue with K80 GPU , same environment, same code worked without an issue in P100
Can confirm that I face the same issue on Tesla K 80 but not on Titan X GPUs. Any idea what's going wrong?
Error Stack on Tesla K 80:
/home/smadan/.local/lib/python3.6/site-packages/pyredner/render_pytorch.py:214: UserWarning: Converting shape vertices from cpu to cuda:0, this can be inefficient.
warnings.warn('Converting shape vertices from {} to {}, this can be inefficient.'.format(shape.vertices.device, device))
/home/smadan/.local/lib/python3.6/site-packages/pyredner/render_pytorch.py:216: UserWarning: Converting shape indices from cpu to cuda:0, this can be inefficient.
warnings.warn('Converting shape indices from {} to {}, this can be inefficient.'.format(shape.indices.device, device))
/home/smadan/.local/lib/python3.6/site-packages/pyredner/render_pytorch.py:55: UserWarning: Converting texture from cpu to cuda:0, this can be inefficient.
warnings.warn('Converting texture from {} to {}, this can be inefficient.'.format(mipmap.device, device))
CUDA Runtime Error: an illegal memory access was encountered at /tmp/pip-req-build-it6swr5w/src/buffer.h:86
I got the same warnings on Titan X, but did not run into the CUDA runtime error.
This is hard for me to debug since I don't have a K80 (and time). I'll see what I can do. Most likely something is wrong with Optix prime. Maybe we should just get rid of it.
Is there an easy to way to disable Optix to check if that solves the problem? If so, I can run tests on Tesla K80 GPUs without optix.
Nope. You can try to modify the ray tracing procedure but it requires a bit of programming efforts.
Sounds good, maybe it's easiest to stick to newer GPUs for now then!