nerf
nerf copied to clipboard
ERROR: failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
After setting up the conda environment, activating it, and downloading the data:
conda env create -f environment.yml
conda activate nerf
bash download_example_data.sh
I try to run python run_nerf.py --config config_lego.txt and I get the following error before the model starts training:
tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED Traceback: ... tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(65536, 63), b.shape=(63, 256), m=65536, n=256, k=63 [Op:MatMul]
I already googled it, but none of the solutions available worked in my case. Could someone help me, please?
I believe issue #127 is related to this.
I met the same problem, and I cannot visit #127. Could someone help me?
I met the same problem, and I cannot visit #127. Could someone help me?
hi guys, has you resolved the problem, I met the same one
Just ran into this one as well. I am on Win11, with a RTX3070 GPU with 8GB RAM ... maybe it's related to needing more VRAM to run this?
// Update: by playing with parameters chunk and netchunk i seem to get it to run... for a bit: The defautl values of 32K and 64K seem far too much for my home GPU to take at once... i am also seeing it run some mroe now, then run into this CUBLAS error code eventually, but i can clearly see how the VRAM gets filled over time with smaller chunk values. I have lowered it to chunk=4 and netchunk=8 as a test.
Reading throug the parameters, i will also try lowering the batch size, or even with the no_batching option and see if then i get it to run fully
please migrate tensorflow-gpu to 2.8, it will be resolved
I met the same problem, and I cannot visit #127. Could someone help me?
hi guys, has you resolved the problem, I met the same one
I encountered this issue because my CUDA version was higher than what TensorFlow 1.15 supported. After using a version of TensorFlow modified by NVIDIA to support the higher CUDA version, the issue was resolved.
please refer this link https://www.toutiao.com/article/7226347983518827011/, I have modified environment.yml for hardware with RTX 3090, as mentioned above, the root cause is that tensorflow and cuda as well as tensorflow-gpu do not matched
I met the same problem, and I cannot visit #127. Could someone help me?
hi guys, has you resolved the problem, I met the same one
I encountered this issue because my CUDA version was higher than what TensorFlow 1.15 supported. After using a version of TensorFlow modified by NVIDIA to support the higher CUDA version, the issue was resolved.
Did you just pip install --user nvidia-pyindex and then call python run_nerf.py --config config_lego.txt again? I'm kind of lost how the nvidia-pyindex comes to play/how to use it
please refer this link https://www.toutiao.com/article/7226347983518827011/, I have modified environment.yml for hardware with RTX 3090, as mentioned above, the root cause is that tensorflow and cuda as well as tensorflow-gpu do not matched
Thank you so much, this was very helpful, both the fork with tf2 support and the blog post. Now i have this running at max capabcity of my 3070 on win11.
Two minor things i found that you might want to check
- i had a rather minor dynamic library load error , whcih is easily fixed by using cudatoolkit 11.2 instead od 11 and cudnn 8.1 instead of 8.0 , which are the exact ones recommended for tensorflow 2.8 as per https://www.tensorflow.org/install/source#gpu
- I also had a problem with imageio because apparently the latest versions of it now no longer recognize the "ignoregamma" parameter used in this codebase. So it is better to specify a version of imageio (i tried with 2.16 and it worked , might not be the ideal one, newer ones may work too)