instant-ngp
instant-ngp copied to clipboard
"Uncaught exception: CUDA Error: failed with error an illegal memory access was encountered" when trying to load a snapshot from a base.msgpack file
Hey,
I am running on Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-89-generic x86_64) CMake 3.23.0 Python 3.9.7 Cuda 11.0
Compiling and running the code works properly. However taking a snapshot of a trained scene and starting it up again does not work. Taking the snapshot words properly as far as I can tell, the base.msgpack file is created. When trying to restart the training from that snapshot the illegal memory access error is encountered:
root@de-lx-titantm:/home/instant-ngp# ./build/testbed --scene data/nerf/Videos2/closet --snapshot data/nerf/Videos2/closet/base.msgpack 08:02:51 INFO Loading NeRF dataset from 08:02:51 INFO data/nerf/Videos2/closet/transforms.json 08:02:51 SUCCESS Loaded 103 images of size 1080x1920 after 0s 08:02:51 INFO cam_aabb=[min=[0.5,0.5,0.5], max=[0.5,0.5,0.5]] 08:02:52 INFO Loading network config from: data/nerf/Videos2/closet/base.msgpack 08:02:53 INFO GridEncoding: Nmin=16 b=1.5874 F=2 T=2^19 L=16 08:02:53 INFO Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1 08:02:53 INFO Color model: 3--[SphericalHarmonics]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3 08:02:53 INFO total_encoding_params=13288400 total_network_params=10240 Could not free memory: CUDA Error: cudaFree(rawptr) failed with error an illegal memory access was encountered CUDA Error: cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered CUDA Error: cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered CUDA Error: cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered CUDA Error: cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered CUDA Error: cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered CUDA Error: cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered Could not free memory: CUDA Error: cudaFree(rawptr) failed with error an illegal memory access was encountered CUDA Error: cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered CUDA Error: cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered CUDA Error: cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered CUDA Error: cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered Could not free memory: CUDA Error: cudaFree(rawptr) failed with error an illegal memory access was encountered 08:02:53 ERROR Uncaught exception: CUDA Error: cudaMemcpyAsync(&n_alive, m_alive_counter.data(), sizeof(uint32_t), cudaMemcpyDeviceToHost, stream) failed with error an illegal memory access was encountered
The same thing does not happen, when running the command with --no_gui
Any ideas what could be the reason? Appreciate any help
same problem here
Hi there, could you check whether this has been resolved by now?
Every time I've had this issue it was related to running out of VRAM -- maybe using a different GPU or shrinking your dataset could help?
In my experience, the extra 2-3 GB of VRAM that the GUI takes up can be enough to overwhelm a card.
@Tom94 I ran into this issue as well and I just checked out the latest commit on master and tested this with the fox dataset on a V100. When I save and then load a snapshot with the optimizer state things crash with the stacktrace in the original issue description. When I save and load a snapshot without the optimizer state things seem to work fine.
I have this issue when running
testbed -m image,
at the same time
testbed -m nerf is a segfault
RTX 3090 Z690 DDR5 i7 12700k Win11
Hello, I am experiencing this same error however my stack trace looks quite different than the aforementioned one:
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:702 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:703 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:702 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:703 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:702 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:703 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:702 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:703 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:702 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:703 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Here is a graph I took a snapshot of in my EVGA Precision X1 HW monitor, the maximum memory utilization reached 7.5GB and was consistent in replications.

I have been a long time lurker and I am a HUGE fan of this work. Thank you!
In my case, the solution was to save a snapshot without the optimizer state as mentioned by @hturki
same issue