mitsuba3 icon indicating copy to clipboard operation
mitsuba3 copied to clipboard

[:bug: bug report] Clang 14 causes DrJIT AD crash

Open bathal1 opened this issue 2 years ago • 2 comments

When compiling the system with Clang 14, differentiation through a rendering of a scene containing a Microfacet BSDF (roughplastic, roughdielectric or roughconductor) causes DrJIT to crash with the following message:

Critical failure in Dr.Jit AD backend: referenced an unknown variable a3582072800!

It appears that:

  • Compiling Mitsuba 3 with earlier versions of Clang (up to 13 included) does not produce this behavior
  • Compiling with Clang 14 in Debug mode does not produce this behavior
  • Disabling virtual function calls and loop recording makes the crash go away
  • This happens in both cuda_ad_rgb and llvm_ad_rgb

After some investigation, the root of the crash seems to be located in the visible hemisphere sampling of the Microfacet distribution, in particular the line:
https://github.com/mitsuba-renderer/mitsuba3/blob/7ca09a3ad95cec306c538493fa8450a096560891/include/mitsuba/render/microfacet.h#L307

The following "fixes" make the crash go away:

  • Replacing wi_p by a fixed vector
  • Making the sincos_phi function return a tuple {1.0f, 1.0f}

Here is a reproducer:

import mitsuba as mi
mi.set_variant("cuda_ad_rgb")
import drjit as dr

scene_dict = mi.cornell_box()
scene_dict['white'] = {
    'type': 'roughplastic',
    'sample_visible': True
}
scene_dict['integrator'] = {'type': 'direct'}
scene_opt = mi.load_dict(scene_dict)

params = mi.traverse(scene_opt)
dr.enable_grad(params['red.reflectance.value'])

img = mi.render(scene_opt, params, seed=0, spp=1)

loss = dr.mean(dr.sqr(img))
dr.backward(loss)

print(dr.grad(params['red.reflectance.value']))

Some additional system info:

  • OS: Ubuntu 22.04
  • GPU: NVIDIA RTX 3080Ti
  • CPU: AMD Ryzen Threadripper 3990X

bathal1 avatar Sep 27 '22 09:09 bathal1

@bathal1 I recently made a number of changes LLVM code generation and cannot reproduce the issue with them. However, my setup might be slightly different than yours.

To be certain, could I ask you to test if your issue goes away when using the drjit-backend-refactor branch of Mitsuba? (don't forget a git submodule update --init --recursive to bring the submodules to the latest version after checking out this branch).

wjakob avatar Dec 18 '22 19:12 wjakob

After compiling the latest changes on the branch you mentioned, I still get the same error on my system, both with cuda_ad_rgb and llvm_ad_rgb variants.

bathal1 avatar Dec 19 '22 10:12 bathal1