mitsuba3
mitsuba3 copied to clipboard
[:bug: bug report] Clang 14 causes DrJIT AD crash
When compiling the system with Clang 14, differentiation through a rendering of a scene containing a Microfacet BSDF (roughplastic
, roughdielectric
or roughconductor
) causes DrJIT to crash with the following message:
Critical failure in Dr.Jit AD backend: referenced an unknown variable a3582072800!
It appears that:
- Compiling Mitsuba 3 with earlier versions of Clang (up to 13 included) does not produce this behavior
- Compiling with Clang 14 in
Debug
mode does not produce this behavior - Disabling virtual function calls and loop recording makes the crash go away
- This happens in both
cuda_ad_rgb
andllvm_ad_rgb
After some investigation, the root of the crash seems to be located in the visible hemisphere sampling of the Microfacet distribution, in particular the line:
https://github.com/mitsuba-renderer/mitsuba3/blob/7ca09a3ad95cec306c538493fa8450a096560891/include/mitsuba/render/microfacet.h#L307
The following "fixes" make the crash go away:
- Replacing
wi_p
by a fixed vector - Making the
sincos_phi
function return a tuple{1.0f, 1.0f}
Here is a reproducer:
import mitsuba as mi
mi.set_variant("cuda_ad_rgb")
import drjit as dr
scene_dict = mi.cornell_box()
scene_dict['white'] = {
'type': 'roughplastic',
'sample_visible': True
}
scene_dict['integrator'] = {'type': 'direct'}
scene_opt = mi.load_dict(scene_dict)
params = mi.traverse(scene_opt)
dr.enable_grad(params['red.reflectance.value'])
img = mi.render(scene_opt, params, seed=0, spp=1)
loss = dr.mean(dr.sqr(img))
dr.backward(loss)
print(dr.grad(params['red.reflectance.value']))
Some additional system info:
- OS: Ubuntu 22.04
- GPU: NVIDIA RTX 3080Ti
- CPU: AMD Ryzen Threadripper 3990X
@bathal1 I recently made a number of changes LLVM code generation and cannot reproduce the issue with them. However, my setup might be slightly different than yours.
To be certain, could I ask you to test if your issue goes away when using the drjit-backend-refactor
branch of Mitsuba? (don't forget a git submodule update --init --recursive
to bring the submodules to the latest version after checking out this branch).
After compiling the latest changes on the branch you mentioned, I still get the same error on my system, both with cuda_ad_rgb
and llvm_ad_rgb
variants.