OpenShadingLanguage
OpenShadingLanguage copied to clipboard
OptiX testrender overhaul
Description
This PR adds support for full path tracing in the OptiX mode of testrender, including full BSDF sampling and evaluation. The benefit of this change is that we're able to render most of the scenes from the testsuite in OptiX mode with results that closely match the host output. This comes at the cost of increased coupling between the host and OptiX renderers, and therefore an increased maintenance burden.
ID-based dispatch
The main difference between the host and OptiX paths is how the individual BSDFs are evaluated in the CompositeBSDF class. Virtual function calls aren't well supported in OptiX, so rather than using regular C++ polymorphism to invoke the sample(), eval(), and get_albedo() functions for each of the BSDF sub-types, we manually invoke the correct function based on the closure ID (which we have added as a member of the BSDF class).
// from shading_cuda.cpp
#define BSDF_CAST(BSDF_TYPE, bsdf) reinterpret_cast<const BSDF_TYPE*>(bsdf)
OSL_HOSTDEVICE Color3
CompositeBSDF::get_bsdf_albedo(const BSDF* bsdf, const Vec3& wo) const
{
...
switch (bsdf->id) {
case DIFFUSE_ID:
albedo = BSDF_CAST(Diffuse<0>, bsdf)->get_albedo(wo);
break;
case TRANSPARENT_ID:
case MX_TRANSPARENT_ID:
albedo = BSDF_CAST(Transparent, bsdf)->get_albedo(wo);
break;
...
Iterative closure evaluation
Another key difference from the host path is the non-recursive closure evaluation. We retain the same style of iterative tree traversal used in the previous OptiX version of process_closure(). This PR also adds evaluate_layer_opacity(), process_medium_closure(), process_background_closure(), which follow the same evaluation pattern.
subpixel_radiance()
The raytracing pipeline mirrors the host code very closely, including camera ray generation and the spawning of secondary rays. This allows a close visual match between the host and OptiX modes.
We've implemented a CUDA version of subpixel_radiance() (in optix_raytracer.cu) that closely mirrors the host version, with the main difference being in how rays are traced and how the shaders are executed. It might be possible to unify the implementations if it would ease the maintenance burden, but for now it seemed cleaner to leave them separate.
Background sampling
We've included support for background closures. This includes a CUDA implementation of the Background::prepare() function. We've broken that function into three phases, where phases 1 and 3 are parallelized across a warp and phase 2 is executed on a single thread. This offers a decent speedup over a single-threaded implementation without the complexity of a more sophisticated implementation.
// from background.h
template<typename F>
OSL_HOSTDEVICE void prepare_cuda(int stride, int idx, F cb)
{
prepare_cuda_01(stride, idx, cb);
if (idx == 0)
prepare_cuda_02();
prepare_cuda_03(stride, idx);
}
Tests
Checklist:
- [x] I have read the contribution guidelines.
- [x] I have updated the documentation, if applicable.
- [x] I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
- [x] My code follows the prevailing code style of this project. If I haven't already run clang-format v17 before submitting, I definitely will look at the CI test that runs clang-format and fix anything that it highlights as being nonconforming.