instant-ngp icon indicating copy to clipboard operation
instant-ngp copied to clipboard

"Uncaught exception: CUDA Error: failed with error an illegal memory access was encountered" when trying to load a snapshot from a base.msgpack file

Open NathanHuetsch opened this issue 3 years ago • 8 comments

Hey,

I am running on Ubuntu 18.04.6 LTS (GNU/Linux 5.4.0-89-generic x86_64) CMake 3.23.0 Python 3.9.7 Cuda 11.0

Compiling and running the code works properly. However taking a snapshot of a trained scene and starting it up again does not work. Taking the snapshot words properly as far as I can tell, the base.msgpack file is created. When trying to restart the training from that snapshot the illegal memory access error is encountered:

root@de-lx-titantm:/home/instant-ngp# ./build/testbed --scene data/nerf/Videos2/closet --snapshot data/nerf/Videos2/closet/base.msgpack 08:02:51 INFO Loading NeRF dataset from 08:02:51 INFO data/nerf/Videos2/closet/transforms.json 08:02:51 SUCCESS Loaded 103 images of size 1080x1920 after 0s 08:02:51 INFO cam_aabb=[min=[0.5,0.5,0.5], max=[0.5,0.5,0.5]] 08:02:52 INFO Loading network config from: data/nerf/Videos2/closet/base.msgpack 08:02:53 INFO GridEncoding: Nmin=16 b=1.5874 F=2 T=2^19 L=16 08:02:53 INFO Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1 08:02:53 INFO Color model: 3--[SphericalHarmonics]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3 08:02:53 INFO total_encoding_params=13288400 total_network_params=10240 Could not free memory: CUDA Error: cudaFree(rawptr) failed with error an illegal memory access was encountered CUDA Error: cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered CUDA Error: cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered CUDA Error: cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered CUDA Error: cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered CUDA Error: cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered CUDA Error: cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered Could not free memory: CUDA Error: cudaFree(rawptr) failed with error an illegal memory access was encountered CUDA Error: cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered CUDA Error: cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered CUDA Error: cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered CUDA Error: cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered Could not free memory: CUDA Error: cudaFree(rawptr) failed with error an illegal memory access was encountered 08:02:53 ERROR Uncaught exception: CUDA Error: cudaMemcpyAsync(&n_alive, m_alive_counter.data(), sizeof(uint32_t), cudaMemcpyDeviceToHost, stream) failed with error an illegal memory access was encountered

The same thing does not happen, when running the command with --no_gui

Any ideas what could be the reason? Appreciate any help

NathanHuetsch avatar Feb 28 '22 08:02 NathanHuetsch

same problem here

jcobreros avatar Mar 01 '22 17:03 jcobreros

Hi there, could you check whether this has been resolved by now?

Tom94 avatar Mar 28 '22 12:03 Tom94

Every time I've had this issue it was related to running out of VRAM -- maybe using a different GPU or shrinking your dataset could help?

In my experience, the extra 2-3 GB of VRAM that the GUI takes up can be enough to overwhelm a card.

slash-under avatar Mar 30 '22 13:03 slash-under

@Tom94 I ran into this issue as well and I just checked out the latest commit on master and tested this with the fox dataset on a V100. When I save and then load a snapshot with the optimizer state things crash with the stacktrace in the original issue description. When I save and load a snapshot without the optimizer state things seem to work fine.

hturki avatar Apr 05 '22 05:04 hturki

I have this issue when running

testbed -m image,

at the same time

testbed -m nerf is a segfault

shi-yan avatar Apr 06 '22 05:04 shi-yan

RTX 3090 Z690 DDR5 i7 12700k Win11

Hello, I am experiencing this same error however my stack trace looks quite different than the aforementioned one:

Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:702 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:703 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:702 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:703 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:702 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:703 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:702 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:703 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:702 cudaEventDestroy(m_training_splitk_events[i]) failed with error an illegal memory access was encountered
F:\projects\instant-ngp\dependencies\tiny-cuda-nn\src\fully_fused_mlp.cu:703 cudaStreamDestroy(m_training_splitk_streams[i]) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered
Could not free memory: F:\projects\instant-ngp\dependencies\tiny-cuda-nn\include\tiny-cuda-nn/gpu_memory.h:129 cudaFree(rawptr) failed with error an illegal memory access was encountered

Here is a graph I took a snapshot of in my EVGA Precision X1 HW monitor, the maximum memory utilization reached 7.5GB and was consistent in replications. image

I have been a long time lurker and I am a HUGE fan of this work. Thank you!

Dmarcotrigiano avatar Apr 28 '22 16:04 Dmarcotrigiano

In my case, the solution was to save a snapshot without the optimizer state as mentioned by @hturki

theFilipko avatar May 27 '22 12:05 theFilipko

same issue

a-sharifi avatar Jul 23 '22 19:07 a-sharifi