MaterialX icon indicating copy to clipboard operation
MaterialX copied to clipboard

Implement `constant` wrap mode in shader code for HW languages

Open ld-kerley opened this issue 3 months ago • 8 comments

While continuing to investigate differences between GLSL and MSL highlighted in #2697 it became apparent that GLSL and MSL don't support the same behavior for constant wrap mode - so in this PR we implement the GLSL version directly in shader source to ensure portability across languages.

We may take this further in the future and implement the rest of the wrap modes if differences appear.

One alternative solution here could be to implement MSL versions of the image node implementations with this additional code, and continue to rely on the GLSL wrap mode behavior, but this will lead to a higher maintenance burden, and its also possible that other new languages might also behave differently, so a direct source code implementation ensures all future languages follow the same behavior.

glsl_vs_msl_wrap_mode_fix.pdf

ld-kerley avatar Dec 02 '25 20:12 ld-kerley

Although this may end up being a good future strategy for hardware shading languages in MaterialX, I have a lot of questions about the potential performance implications!

Would this approach reduce performance for complex ALU-bound materials in real-time renderers? It doesn't seem obvious to me what the answer would be, and we should be very sure before proceeding.

jstone-lucasfilm avatar Dec 03 '25 00:12 jstone-lucasfilm

My guess is that this conditional and early return would be implemented as a dynamic branch by the shader compiler, which can have significant performance implications in shaders that require a large number of registers.

I'd love to hear from folks with strong, recent expertise in real-time shading, though, and it could be that the proposed new code is less problematic than my own experience in this domain would suggest.

jstone-lucasfilm avatar Dec 03 '25 01:12 jstone-lucasfilm

Afaik, the uv can be computed along with the sampler lookup, and then a select should be done afterwards. The lookup and select shoudl mask each other, and avoid divergent branching.

Apologies for not being more vigilant in the review.

kwokcb avatar Dec 03 '25 02:12 kwokcb

void mx_image_color3($texSamplerSignature,
                     int layer,
                     float3 defaultval,
                     float2 texcoord,
                     int uaddressmode,
                     int vaddressmode,
                     int filtertype,
                     int framerange,
                     int frameoffset,
                     int frameendaction,
                     float2 uv_scale,
                     float2 uv_offset,
                     thread float3& result)
{
    float2 uv = mx_transform_uv(texcoord, uv_scale, uv_offset);

    bool outsideU = (uaddressmode == 0) && (uv.x < 0.0 || uv.x > 1.0);
    bool outsideV = (vaddressmode == 0) && (uv.y < 0.0 || uv.y > 1.0);
    bool useDefault = outsideU || outsideV;

    // Always sample once (avoids divergent control flow)
    float3 sampled = texture($texSamplerSampler2D, uv).rgb;

    // Select default or sample (branchless in Metal).
    // mix() appears to perform better with float3 than select()
    result = mix(sampled, defaultval, float(useDefault))
    // result = select(sampled, defaultval, useDefault);
}

I think you want something like the above.

kwokcb avatar Dec 03 '25 02:12 kwokcb

I'm happy to go with either route - and am certainly not the real time performance expert. In my limited local testing I wasn't able to measure any performance difference at all, but maybe someone with access to more complex production assets could share some concrete numbers to inform which direction we should head here?

I'm very happy to update the code per @bernardkwok suggestion here - in fact that could would be easier share in a library function.

Either way someone with better access to data should profile this - I don't really think the assets we have in the test suite are complex enough perhaps?

ld-kerley avatar Dec 03 '25 04:12 ld-kerley

Do we want to handle the other wrap modes in a similar way here as well? They match with todays GLSL/MSL comparison, but they do require the render engine the shader is being passed to to correctly configure the texture.

It would be possible to write them all in shader source and ensure they are consistent

ld-kerley avatar Dec 03 '25 04:12 ld-kerley

Hi @ld-kerley,

Thanks for following up.

As we don't put in conditional branches (and especially not texture fetches within a branch) the number of ALU instructions added for the test and hence processing time is masked (& should run in parallel) by the much more expensive texture fetch. Looks around 10-12 ALU ops.

Even if we added in all the logic for full wrap support should still be minimal, but maybe best for a later PR to avoid stalling this fix. There can be a few approaches:

  • a generalized version with some macro replacement for optimal intrinsic calls;
  • specialized versions for each language (GLSL, MSL, ESSL etc), or
  • stick with generalized only code.

I've added this issue

@jstone-lucasfilm, does this sound like a reasonable path forward ?

kwokcb avatar Dec 03 '25 15:12 kwokcb

@kwokcb @ld-kerley I'd be interested in a deeper understanding of the differences in support for texture address modes across GLSL, HLSL, MSL, WGSL, and other hardware shading languages.

Is there a broad pattern of divergence across hardware shading languages, or is MSL simply missing one of the modes that GLSL, HLSL, and WGSL all support in a consistent way?

jstone-lucasfilm avatar Dec 04 '25 01:12 jstone-lucasfilm