Crash in SampledGrid::Lookup() method
(I was actually trying to reproduce this when I stumbled across #150)
When rendering with --gpu or --wavefront you can see the Medium's bounding Shape as an artifact.
Rendered with c65e25c
CPU - VolPath
pbrt wavefront_bug.pbrt --outfile cpu.png
This is what I would expect the render to look like.

GPU
pbrt wavefront_bug.pbrt --outfile cpu.png
You can see a box shape artifact.

CPU - Wavefront
pbrt --wavefront wavefront_bug.pbrt --outfile wavefront.png
Release Mode
There is a SegFault
(gdb)
#0 0x0000000000000000 in ?? ()
#1 0x0000555555ad0d8f in auto pbrt::SampledGrid<float>::Lookup<__nv_hdl_wrapper_t<false, true, __nv_dl_tag<float (pbrt::SampledGrid<float>::*)(pbrt::Point3<float> const&) const, &(pbrt::SampledGrid<float>::Lookup(pbrt::Point3<float> const&) const), 1u>, float (float)> >(pbrt::Point3<float> const&, __nv_hdl_wrapper_t<false, true, __nv_dl_tag<float (pbrt::SampledGrid<float>::*)(pbrt::Point3<float> const&) const, &(pbrt::SampledGrid<float>::Lookup(pbrt::Point3<float> const&) const), 1u>, float (float)>) const ()
#2 0x0000555555b048f8 in pbrt::SampledSpectrum pbrt::CuboidMedium<pbrt::UniformGridMediumProvider>::SampleT_maj<pbrt::WavefrontPathIntegrator::SampleMediumInteraction(int)::{lambda(pbrt::MediumSampleWorkItem)#1}::operator()(pbrt::MediumSampleWorkItem) const::{lambda(pbrt::MediumSample const&)#1}>(pbrt::Ray, float, float, pbrt::RNG&, pbrt::SampledWavelengths const&, pbrt::WavefrontPathIntegrator::SampleMediumInteraction(int)::{lambda(pbrt::MediumSampleWorkItem)#1}::operator()(pbrt::MediumSampleWorkItem) const::{lambda(pbrt::MediumSample const&)#1}) const ()
Debug Mode
Renders but with the artifact.
Sample Scene
The above example was using a UniformGrid Medium, but the same happens with the NanoVDB Medium as well.

In the GPU render you can see the back side of the box Shape containing the medium. (There camera is outside of the box.)
(Nice explosion!)
One piece of good news is that that visible box will go away if you increase the integrator's 'maxdepth'. This stems from the GPU/wavefront integrator measuring depth differently than the regular CPU integrators: it includes medium transition interfaces in its depth count, while the regular CPU path does not. This is fairly unfortunate, but is not easily fixed, especially at this point. We should probably increase the default maxdepth to make this issue pop up less often.
The crash is interesting... (And particularly surprising, for such a simple scene, very similar to many that have long rendered successfully.) Things learned:
- Doesn't reproduce on OSX
- Does reproduce on Linux, but only in release builds (as you note), and only when building with NVCC for GPU support
- Though the crash happens in code that runs on the CPU
- Valgrind reports nothing amiss
- Debugging printfs indicate that the density value is being read successfully from memory, which further indicates not a memory stomp.
- The crash seems to be in the lambda passed into the grid's
Lookup()method. But in this case, the lambda is just a pass-through function that returns the value passed to it. - If I take parameter values that make it crash and make the exact same call elsewhere (e.g. in the grid's constructor), it's fine
I hate to blame the compiler, but this smells like a compiler bug. I'm seeing it with CUDA 11.2.152. (Which version are you using?)
I have just pushed a workaround that adds grid Lookup() methods that don't take that conversion lambda, for cases where it isn't needed. That is obviously highly unsatisfying, but hopefully it keeps you moving forward. I'll continue to dig in.
Good to know about the max depth! When rendering on a GPU what's a few extra bounces going to matter. :D
For the crashing - CUDA: 11.2.1, (nvcc 11.2.142) gcc: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
I'll give the CUDA Toolkit 11.4 a go later today,
Crashes with 11.4.48 as well. (Unless the issue is on the gcc side, where I could give clang a try.)
Interesting update:
- The example scene no longer crashes when rendering with
--wavefrontmode with 11ef9b3. - Just to verify that it wasn't due to an updated version of nvcc (11.4.100) I recompiled with c65e25c and the crash returned.
- A git bisect reports that 971a4bc, Stop using monotonic_buffer_resource for big allocs in WavefrontIntegrator constructor, resolved/avoided the compiler issue.
Yay?