OpenImageIO icon indicating copy to clipboard operation
OpenImageIO copied to clipboard

feat: CUDA/OptiX implementation of TextureSystem

Open lgritz opened this issue 2 months ago • 4 comments

Most production renderers that use OSL for shading also use OIIO's TextureSystem as their texture engine, and also quite a few non-OSL renderers still use OIIO TextureSystem. But as those renderers add GPU abilities with CUDA and OptiX, they are pretty much on their own when it comes to texture (OSL gives them a customization point where the renderer itself provides texture facilities).

We've talked for some time about OIIO providing a CUDA/OptiX implementation of TS, and it's time to finally get it underway. I think we can get to an MVP in main relatively rapidly, and then continue work to make this a major supported feature.

I would consider the basic design goals to be the following:

  • Delivered as part of OIIO somehow, and in a form easily consumable by OSL and other renderers.
  • With full feature/quality parity with the existing and future development of the CPU-based OIIO TS (this can happen in stages, with an MVP delivering basic functionality and additions over time getting us to full parity and higher quality).
  • With as close as possible an API to existing TS.
  • Sharing as much computational code as possible between CPU and GPU implementations.
  • Using the OptiX toolkit's on-demand texture loading to provide similar functionality to ImageCache (or maybe as part of, or in conjunction with IC?).

Using this issue to plan, discuss design, track progress, and rally participation from interested parties.

lgritz avatar Oct 22 '25 16:10 lgritz

We're definitely interested in contributing @lgritz. @curtisblack and Thibault V would be the representatives on our end at Netflix Animation Studios. Thanks for raising this initiative! We are super excited to get involved.

etheory avatar Oct 23 '25 01:10 etheory

Thanks, that's awesome. SPI and NVIDIA and will also be there and ready to pitch in.

Just to repeat here: We will dedicate the next OSL TSC meeting of Oct 23 2pm PT, and the OIIO TSC of Nov 3 2pm PT, to this topic. Visit https://calendar.aswf.io to find zoom links.

lgritz avatar Oct 23 '25 02:10 lgritz

Notes from meeting today (this is a braindump, from just the meeting, and some other ideas I had, but it's not an official or final set of notes):

Start initially with testrender in OSL, develop as a group this functionality, then figure out how to move that stuff back into OIIO.

Initial implementation likely to be tied to CUDA + OptiX. But the long term goal would be to support arbitrary platform, abitrary CPU, arbitrary GPU.

Would it be useful to introduce the free-function additions that were added for OSL to OIIO to support things like:

  • Mip-map selection
  • Shader-based mip-map scaling/editing
  • The choice of footprint, be it conical or trapezoidal, or something else
  • The filtering method, sample weighting
  • Stochastic vs non-stochastic operation

Complicating factors:

  • ptex support
  • The method that is used to handle the case where a tile is not available, i.e. one colour, lowest mip-map etc. what's a fast way to do this?
  • Do we need some change to the API to return true if a tile is available, and false if it isn't to allow the implementation to figure out what to do, or should OIIO do this?
  • GPUs are memory constrained, so should OIIO handle the primary memory limiting, or should this be deferred to the implementation in the renderer? Unclear, as everyone had different thoughts on this. I feel that by default OIIO should handle this limiting of memory.

Thanks!

etheory avatar Oct 23 '25 22:10 etheory

For Cycles we're implementing a custom texture system. Probably I won't have time to get involved in this in the near term since the timing didn't work out, but very happy to see this happening.

Our core requirements are perhaps a bit different since we never exposed the more advanced OpenImageIO texture system features to end users, and we support non-NVIDIA GPUs. In case it's useful, this is how I'm trying to optimize performance and memory usage. Somewhat similar to how megatextures work in games.

  • Filtering is only bilinear or bicubic, no complex filters like EWA. Assuming that we will take many samples anyway.
  • Each tile has border pixels that duplicate pixels from neighboring tiles. Stochastic mip level sampling is used. This makes it possible access a single tile for every texture lookup, with hardware accelerated texture interpolation and no divergence on the GPU.
  • We plan to add support for block compression (or even neural texture compression?) in the future, to load more tiles in the same memory. For interactive renders, but also curious how big the difference really is for offline renders.
  • For quick loading, the bordered and compressed tiles need to be stored directly in the file, which requires a custom or customized file format.

brechtvl avatar Oct 23 '25 23:10 brechtvl