Segfault with WSL2 + GUI
CUDA: 11.6
GPU: RTX3080
OS: Ubuntu 20.04 - WSL2
When I try to run $ ./build/testbed --scene data/nerf/fox I run into a segmentation fault error as it tries to create the GUI.
23:09:29 INFO Loading NeRF dataset from
23:09:29 INFO data/nerf/fox/transforms.json
23:09:29 SUCCESS Loaded 50 images of size 1080x1920 after 0s
23:09:29 INFO cam_aabb=[min=[1.0229,-1.33309,-0.378748], max=[2.46175,1.00721,1.41295]]
23:09:29 INFO Loading network config from: configs/nerf/base.json
23:09:29 INFO GridEncoding: Nmin=16 b=1.51572 F=2 T=2^19 L=16
23:09:29 INFO Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
23:09:29 INFO Color model: 3--[Composite]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
23:09:29 INFO total_encoding_params=13074912 total_network_params=10240
Segmentation fault
Running instant-ngp$ ./build/testbed --scene data/nerf/fox --no-gui works fine. (But I don't know how to retrieve the results in no-gui mode.)
23:06:47 INFO Loading NeRF dataset from
23:06:47 INFO data/nerf/fox/transforms.json
23:06:47 SUCCESS Loaded 50 images of size 1080x1920 after 0s
23:06:47 INFO cam_aabb=[min=[1.0229,-1.33309,-0.378748], max=[2.46175,1.00721,1.41295]]
23:06:47 INFO Loading network config from: configs/nerf/base.json
23:06:47 INFO GridEncoding: Nmin=16 b=1.51572 F=2 T=2^19 L=16
23:06:47 INFO Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
23:06:47 INFO Color model: 3--[Composite]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
23:06:47 INFO total_encoding_params=13074912 total_network_params=10240
23:06:48 INFO iteration=16 loss=0.0304178
23:06:48 INFO iteration=32 loss=0.0106814
23:06:49 INFO iteration=48 loss=0.00569914
23:06:49 INFO iteration=64 loss=0.00367392
23:06:49 INFO iteration=80 loss=0.0061358
23:06:49 INFO iteration=96 loss=0.00810323
23:06:49 INFO iteration=112 loss=0.00719573
23:06:50 INFO iteration=128 loss=0.00576638
23:06:50 INFO iteration=144 loss=0.00483101
23:06:50 INFO iteration=160 loss=0.00399378
23:06:50 INFO iteration=176 loss=0.00363595
23:06:51 INFO iteration=192 loss=0.00347037
23:06:51 INFO iteration=208 loss=0.00316136
23:06:51 INFO iteration=224 loss=0.00311616
23:06:51 INFO iteration=240 loss=0.00285755
23:06:51 INFO iteration=256 loss=0.00247959
23:06:52 INFO iteration=272 loss=0.00248012
+1 I get the same seg-fault on the testbed fox scene as above. CUDA: 11.1 GPU: GTX1080Ti OS: Ubuntu 20.04 WSL2
12:54:40 INFO Loading NeRF dataset from
12:54:40 INFO data/nerf/fox/transforms.json
12:54:40 SUCCESS Loaded 50 images of size 1080x1920 after 0s
12:54:40 INFO cam_aabb=[min=[0.5,0.5,0.5], max=[0.5,0.5,0.5]]
12:54:40 INFO Loading network config from: configs/nerf/base.json
12:54:40 INFO GridEncoding: Nmin=16 b=1.51572 F=2 T=2^19 L=16
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
12:54:40 INFO Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
12:54:40 INFO Color model: 3--[Composite]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
12:54:40 INFO total_encoding_params=13074912 total_network_params=9728
Segmentation fault
--no-gui seems to work as above.
Unfortunately, I cannot reproduce this problem. Could you run testbed with a debugger and let me know which line of code is responsible for the segfault?
(If you compiled in RelWithDebInfo mode, this should work out of the box. No need to recompile in debug mode.)
Command:
$ gdb --args ./build/testbed --scene data/nerf/fox
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./build/testbed...
(gdb) run
Starting program: /home/xxx/projects/instant-ngp/build/testbed --scene data/nerf/fox
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffda831000 (LWP 8010)]
17:34:49 INFO Loading NeRF dataset from
[New Thread 0x7fffd9b1c000 (LWP 8011)]
[New Thread 0x7fffd931b000 (LWP 8012)]
[New Thread 0x7fffd8b1a000 (LWP 8013)]
[New Thread 0x7fffd1eff000 (LWP 8014)]
[New Thread 0x7fffd16fe000 (LWP 8015)]
[New Thread 0x7fffd0efd000 (LWP 8016)]
[New Thread 0x7fffbffff000 (LWP 8017)]
[New Thread 0x7fffbf7fe000 (LWP 8018)]
[New Thread 0x7fffbeffd000 (LWP 8019)]
[New Thread 0x7fffbe7fc000 (LWP 8020)]
[New Thread 0x7fffbdffb000 (LWP 8021)]
[New Thread 0x7fffbd7fa000 (LWP 8022)]
17:34:49 INFO data/nerf/fox/transforms.json
17:34:50 SUCCESS Loaded 50 images of size 1080x1920 after 0s
17:34:50 INFO cam_aabb=[min=[0.5,inf,0.5], max=[0.5,-inf,0.5]]
[Thread 0x7fffbdffb000 (LWP 8021) exited]
[Thread 0x7fffbd7fa000 (LWP 8022) exited]
[Thread 0x7fffbe7fc000 (LWP 8020) exited]
[Thread 0x7fffbf7fe000 (LWP 8018) exited]
[Thread 0x7fffbeffd000 (LWP 8019) exited]
[Thread 0x7fffbffff000 (LWP 8017) exited]
[Thread 0x7fffd0efd000 (LWP 8016) exited]
[Thread 0x7fffd16fe000 (LWP 8015) exited]
[Thread 0x7fffd1eff000 (LWP 8014) exited]
[Thread 0x7fffd8b1a000 (LWP 8013) exited]
[Thread 0x7fffd931b000 (LWP 8012) exited]
[Thread 0x7fffd9b1c000 (LWP 8011) exited]
Here is the stacktrace that I received after the segfault:
17:34:50 INFO Loading network config from: configs/nerf/base.json
17:34:50 INFO GridEncoding: Nmin=16 b=1.51572 F=2 T=2^19 L=16
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
17:34:50 INFO Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
17:34:50 INFO Color model: 3--[Composite]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
17:34:50 INFO total_encoding_params=13074912 total_network_params=9728
[Detaching after vfork from child process 8023]
[New Thread 0x7fffd9b1c000 (LWP 8025)]
[New Thread 0x7fffd931b000 (LWP 8026)]
[New Thread 0x7fffd8b1a000 (LWP 8027)]
[New Thread 0x7fffd1eff000 (LWP 8028)]
[New Thread 0x7fffbe6ec000 (LWP 8029)]
Thread 1 "testbed" received signal SIGSEGV, Segmentation fault.
0x00007ffff7fe6ea5 in ?? () from /lib64/ld-linux-x86-64.so.2
(gdb) backtrace
#0 0x00007ffff7fe6ea5 in ?? () from /lib64/ld-linux-x86-64.so.2
#1 0x00007ffff036c838 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffa450, operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:208
#2 0x00007ffff036c903 in __GI__dl_catch_error (objname=0x55555670fe50, errstring=0x55555670fe58, mallocedp=0x55555670fe48, operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:227
#3 0x00007ffff074bb59 in _dlerror_run (operate=operate@entry=0x7ffff074b420 <dlclose_doit>, args=0x0) at dlerror.c:170
#4 0x00007ffff074b468 in __dlclose (handle=<optimized out>) at dlclose.c:46
#5 0x00007fffe0ae6f59 in ?? () from /usr/lib/wsl/drivers/nvddi.inf_amd64_0e2fb78c67ddb7a5/libcuda.so.1.1
#6 0x00007fffe0a55df1 in ?? () from /usr/lib/wsl/drivers/nvddi.inf_amd64_0e2fb78c67ddb7a5/libcuda.so.1.1
#7 0x00007fffe0b5bda8 in ?? () from /usr/lib/wsl/drivers/nvddi.inf_amd64_0e2fb78c67ddb7a5/libcuda.so.1.1
#8 0x0000555555816976 in cudart::cudaApiGraphicsGLRegisterImage(cudaGraphicsResource**, unsigned int, unsigned int, unsigned int) ()
#9 0x00005555558586a1 in cudaGraphicsGLRegisterImage ()
#10 0x000055555575c28f in ngp::GLTexture::CUDAMapping::CUDAMapping (this=0x55555852a020, texture_id=<optimized out>, size=...) at /home/xxx/projects/instant-ngp/src/render_buffer.cu:177
#11 0x000055555575c4af in std::make_unique<ngp::GLTexture::CUDAMapping, unsigned int, Eigen::Matrix<int, 2, 1, 0, 2, 1>&> () at /usr/include/c++/9/bits/unique_ptr.h:856
#12 ngp::GLTexture::surface (this=0x555557fcc1e0) at /home/peter/projects/instant-ngp/src/render_buffer.cu:92
#13 0x000055555575bde0 in ngp::CudaRenderBuffer::surface (this=0x555557fcc240) at /usr/include/c++/9/bits/shared_ptr_base.h:1020
#14 ngp::CudaRenderBuffer::tonemap (this=this@entry=0x555557fcc240, exposure=0, background_color=..., output_color_space=output_color_space@entry=ngp::EColorSpace::SRGB, stream=0x555556b71fb0)
at /home/xxx/projects/instant-ngp/src/render_buffer.cu:540
#15 0x00005555555dd844 in ngp::Testbed::render_frame (this=0x7fffffffbe60, camera_matrix0=..., camera_matrix1=..., render_buffer=..., to_srgb=<optimized out>) at /home/xxx/projects/instant-ngp/src/testbed.cu:1985
#16 0x00005555555df71a in ngp::Testbed::draw_contents (this=0x7fffffffbe60) at /usr/include/c++/9/bits/stl_iterator.h:806
#17 0x00005555555ed4eb in ngp::Testbed::frame (this=0x7fffffffbe60) at /home/xxx/projects/instant-ngp/src/testbed.cu:1413
#18 0x0000555555597898 in main (argc=3, argv=0x7fffffffde08) at /home/xxx/projects/instant-ngp/src/main.cu:229
(gdb)
Thank you, this is super useful!
CUDA's GL interop API (cudaGraphicsGLRegisterImage()) seems to segfault rather than returning an error code on WSL2 now. It used to work on the WSL beta driver, but I can now also reproduce the segfault on the latest driver.
Unfortunately, I don't see an elegant way to programmatically detect this and work around it -- for the time being, you can replace the line
static bool IS_CUDA_INTEROP_SUPPORTED = true;
with the line
static bool IS_CUDA_INTEROP_SUPPORTED = false;
in src/render_buffer.cu.
Gonna leave this issue open until the segfault is gone, though.
Thanks, it is working now.
@pdmct
Thanks, it is working now.
Do you mean it's working out of the box for you, or do you mean with the "static bool IS_CUDA_INTEROP_SUPPORTED = false;" workaround?
Out of the box I am still seeing this error with: Windows 11 Version 10.0.22000.593 WSL Kernel version: 5.10.102.1 Ubuntu 21.10 (Ubuntu Community Preview) RTX 2080 Ti Studio Driver Version 512.15 cuda-repo-wsl-ubuntu-11-6-local_11.6.2-1_amd64.deb WSL NVIDIA-SMI 510.60.02
Yes, once I updated the flag it worked for me.
hey - i hit this error and the flag update worked for me, but segfault still gets thrown when i click 'Mesh it'
I hit this error and unfortunately couldn't resolve that by the method written here.
Now, in /src/render_buffer.cu,
the line static bool IS_CUDA_INTEROP_SUPPORTED = true;
has been replaced with that static bool s_is_cuda_interop_supported = true.
I replaced the flag s_is_cuda_interop_supported = true
with s_is_cuda_interop_supported = false,
but segmentation fault remains.
mhhh... seems like the line has just been replaced with "static bool s_is_cuda_interop_supported = !is_wsl();" Soooo... "Mesh it!" still does not work and throws a segmentation error. Any idea about why?
That's because the mesh rendering code doesn't respect s_is_cuda_interop_supported and tries to use CUDA<->GL interop regardlessly. FWIW, I think it's too much trouble, given the marginal benefit, to rewrite the mesh rendering pipeline. I recommend using the Python bindings rather than the GUI to produce meshes on WSL instead. Cheers!