MaterialX icon indicating copy to clipboard operation
MaterialX copied to clipboard

MaterialX Shader Generation Abstraction (aka Visitor Pattern)

Open ld-kerley opened this issue 4 months ago • 11 comments

TL;DR

We propose introducing an abstract interface to define the material data passed to the MaterialX shader generation system. This will allow more efficient shader generation in some systems and allow other material data models that leverage the MaterialX standard data library to generate shader code without converting the data.

Overview

The MaterialX shader generation accepts its input in the form of a MaterialX Element that exists in a MaterialX document. When using this system in a purely MaterialX-based environment, this is a reasonable constraint, but MaterialX is integrated into a number of different runtime environments, including, but not limited to, OpenUSD.

Problem

We will use OpenUSD as an example, as it is possibly the most prevalent MaterialX integration. If we explore that system, we will see that there are ways we can make MaterialX shader generation more efficient by introducing a layer of abstraction for the data model that is input to the shader generation system.

OpenUSD has its own USDShade schema that is used to describe material nodegraphs and interfaces. There exists an UsdMtlx OpenUSD module that includes a file-format plugin that reads a .mtlx file and converts it to corresponding USDShade primitives. This then becomes the representation in OpenUSD that a user interacts with, potentially modifying the material using OpenUSD opinions or composition arcs. When the OpenUSD stage is composed, or "flattened", we end up with a concrete USDShadeMaterial with a nodegraph of USDShadeShader nodes that conceptually represent the same original material in the .mtlx file, along with any modifications. This "flattened" OpenUSD stage can then be passed to Hydra, which is the rendering interface abstraction in OpenUSD. The USDShade material gets converted to a HdMaterialNetwork by Hydra before being passed to a renderer. Hydra has a corresponding HdMtlx plugin that is used to recreate a MaterialX document that is then passed to the MaterialX shader generation system by any downstream renderer, including Storm, the OpenGL/Metal-based renderer provided with OpenUSD. It is this recreation of a MaterialX document that this proposal considers inefficient (red nodes). It is a measurable cost to the process of shader generation, and we propose allowing its removal from the system.

---
config:
theme: redux
layout: dagre
---
    flowchart LR
    subgraph USD["USD"]
        mtlx(["MtlX Doc"])
        usd_a(["USD Layer"])
        usd_b(["USD Stage"])
        UsdMtlx["UsdMtlx"]
    end
    subgraph Hydra["Hydra"]
        direction LR
        hdNetMtl("HdNetworkMaterial")
        HdMtlx["HdMtlx"]
        mtlx2(["Mtlx Doc"])
    end
    subgraph HdStorm["HdStorm"]
        mtlxShdGen["Mtlx Shader Gen"]
        shdSrc(["Shader Code"])
    end
    mtlx --> UsdMtlx
    UsdMtlx --> usd_b
    usd_a --> usd_b
    hdNetMtl --> HdMtlx
    HdMtlx --> mtlx2
    mtlxShdGen --> shdSrc
    mtlx2 --> mtlxShdGen
    usd_b --> Hydra
    style mtlx2 fill:#662222
    style HdMtlx fill:#662222

Proposal

We propose introducing an abstraction layer, that in conversations has been referred to as a "Visitor Pattern", that will allow the MaterialX shader generation system to be driven by data sources other than a MaterialX document.

Concretely, in the OpenUSD use-case, we would then be able to write a concrete implementation of this abstract interface layer for Hydra. This would allow us to use the data in a HdMaterialNetwork along with the Shader Definition Registry (SDR) as input to the MaterialX shader generation system. Bypassing the creation of the MaterialX document, we would allow shader generation directly from the Hydra data. This pattern could be followed by any other MaterialX integration that had its own data representation for the material nodegraph.

---
config:
theme: redux
layout: dagre
---
    flowchart LR
    subgraph USD["USD"]
        mtlx(["MtlX Doc"])
        usd_a(["USD Layer"])
        usd_b(["USD Layer"])
        UsdMtlx["UsdMtlx"]
    end
    subgraph Hydra["Hydra"]
        direction LR
        hdNetMtl("HdNetworkMaterial")
    end
    subgraph HdStorm["HdStorm"]
        mtlxShdGen["Mtlx Shader Gen"]
        shdSrc(["Shader Code"])
    end
    mtlx --> UsdMtlx
    UsdMtlx --> usd_b
    usd_a --> usd_b
    hdNetMtl --> mtlxShdGen
    mtlxShdGen --> shdSrc
    usd_b --> Hydra

Details

The current MaterialX shader generation implementation is tightly coupled to the MaterialX data model, and this abstraction layer would need to break that coupling. There are a number of different factors that I believe would need to be reworked to introduce a useful abstraction layer here. There will likely be other more concrete details to be resolved as the work unfolds.

Data Ownership

There are a few different places where the MaterialX shader generation system retains shared pointers to the input MaterialX document objects. In order to decouple the shader generation system from the MaterialX data model, these references would need to be removed, and refactored as accesses through the abstract interface.

Once there is no data retention in the shader generator, it should be safe to refactor the system to accept constant raw pointers instead of shared pointers. The data should only ever be read on the shader generation system side. This presumes that the source data model is persistent through the entire process of shader generation, which seems like a reasonable requirement. To the authors’ knowledge, this is true of current MaterialX implementations.

Object Access

Currently, the MaterialX shader generation system receives the data model via a pointer. This assumes construction of a shared pointer, or even a pointer at all, in the data model. I propose developing a data handle system for communication across the abstract interface. This handle would likely be backed by a simple long integer (64 bits). This would allow concrete implementations to store pointers, but also allow for other identifiers such as hashes of string names, or even indices to be stored in the handle. This would impose a much lower bar of potential data conversion for all concrete implementers of the abstract interface.

Data Model Inheritance

The MaterialX data model is based on a deep hierarchy of C++ inheritance. The relationships between the different components in this model may not necessarily be the same in other data representations. This proposal suggests that this abstract interface would need to remove that inheritance from the interface as not all data models would follow the same data structure relationships.

It is probably most generalized if the abstract interface is just a set of functions that accept some sort of pointer or handle to the object being referred to. I believe this would allow implementers of the interface to connect it to their data model without restriction.

Non-copying interface

As much as possible, the interface should follow similar design goals to the Hydra interface, where data is passed through the interface as directly as possible, with copies taken as little as is necessary.

Separation of ShaderGraph creation from Shader Source emission

The MaterialX shader generation system currently consists of two major parts. The first part ingests the data model representing the material, and creates an internal representation of ShaderGraph and ShaderNode objects. The second part processes and parses the ShaderGraph and ShaderNode objects to emit the concrete shader source file(s). A single object, GenContext,maintains thread-safe state through the process.

---
config:
  theme: redux
  layout: dagre
---
flowchart LR
    subgraph dia1 ["ShaderGenerator::generate(mtlxdoc)"]
        direction TB
        MtlxDoc1
        subgraph GenContext
        create1["Create ShaderGraph"]
        emit1["Emit Source"]
        create1 --> emit1
        end
        MtlxDoc1 --> create1 & emit1
    end

The second part, the shader emission, should never need to know anything about the original source data model. This second part is also likely the part of the shader generation system where most of the specialization happens for concrete output languages, or downstream integrations.

We propose making a clearer distinction within the shader generation system between these two phases, and to reduce the amount of customization possible in the first phase. This simplification will improve robustness by ensuring all different shader generators create the intermediate representation the same way. This should also mean that none of the shader generation specializations for the different output languages/integrations should need to know about the abstract interface. The abstract interface's only concern should be facilitating the construction of the intermediate ShaderGraph and ShaderNode representation in the first phase.

Concretely, this would mean creating a set of centralized API calls common to all shader generation implementations, with a generate() entry point, which would then call specialized calls to emit the shader source. We also propose decomposing the GenContext object into two parts to service the two respective phases of the shader generation: GenContextCreate and GenContextEmit. This separation will aid with ensuring the system does not "leak" any data model dependency into the shader source emission phase over time, and also simplify the two respective phases by making it clear the per-thread data required for each respective stage.

---
config:
  theme: redux
  layout: dagre
---
flowchart LR
    subgraph dia2 ["ShaderGenerator::generate(visitor)"]
        direction TB
        subgraph GenCreateContext
            Visitor
            create2["Create ShaderGraph"]
        end
        subgraph GenEmitContext
            emit2["Emit Source"]
        end
        create2 --> emit2
        Visitor --> create2
    end

It is likely that specialization points may be added before and after the ShaderGraph creation, to allow pre or post processing of the ShaderGraph by in specific backend implementations, but the intention is that the graph creation would largely be consistent across all backends.

Suggested Process

This is a significant piece of work and a large undertaking. There are obviously a number of different approaches we could take. I thought it would be useful to seed conversation with some ideas about how things could unfold. Some of these have been raised by others in previous conversations.

Note all proposed names below are just initial suggestions. While I am very open to bike-shedding better names, I do think it would be productive to postpone the "naming debate" until most of the architectural decisions have been resolved.

Separate Shader Generation module

The current MaterialX shader generation system exists in a MaterialX module MaterialXGenShader (and associated specialized backends). I propose this work be developed in a new MaterialX module MaterialXGenShader2 (and potentially other newly associated backends), alongside the existing shader generation module MaterialXGenShader. I believe this will serve two concrete benefits: side-by-side testing (discussed in the next section) and easier git workflow. If we develop in the same module as is being developed in main, then merging/rebasing to updated main will be challenging later in the process once significant changes have been made. A separate module should have zero conflicts, and this work would never make modifications to the existing shader generation system, so the updates should never (or at least rarely) conflict.

Testing/Validation

As discussed above, developing in a parallel MaterialX module would make testing and validation easier. I would propose building a new test harness (that we would run alongside all the existing tests), where we generate shader code for any given input MaterialX file both in the current shader generation system and the newly developed one. The ultimate validation for this new system would be if it generates identical shader source code in all cases. This may not be the case ultimately, as the new shader generation system may eventually generate improved/simplified/optimized shader code, but at least initially we should expect identical source generation. The point at which that is no longer the case should be a deliberate decision.

Git Branch

I propose all of this work happen in a separate git branch from main. I do not expect all of this work to land in a single commit. There are a number of concrete phases to this work that can be validated progressively. I would propose that the review process be reconsidered a little for this work, not focusing on code style or efficiency initially, as a lot of the work will be incrementally improved along the way. We may even decide it makes sense to not require review approval at every stage. The benefit of landing the work in stages is to help people understand the thought process behind the work, making the reviews for the later stages easier with more context and understanding. It is important that all the tests pass for every phase of work committed.

Deploy as optional

Once initially complete, I would propose merging this back to main and making it a second optional "opt-in" shader generator interface. This will allow downstream integrators to easily experiment with the new shader generation interface and provide important feedback before we deprecate the existing shader generation system. Keeping main as up to date as possible will be important to avoid having to deal with drift in implementations between the new shader generation system and the current one.

Development Steps

The following are the concrete steps I think we can take to initiate this process. One big concrete benefit of laying out these steps is that the process can land in the public repo in a series of functioning phases, helping everyone understand the design process and give feedback along the way.

Development framework setup

Before development work starts on the new shader generation system, we should create a robust development environment and testing framework to allow us to work quickly with confidence.

  • Create a new git branch.
  • Replicate the existing shader generation modules to the new module name.
  • Replicate all the existing unit tests targeting the new shader generation system - we want at least the same level of testing coverage for the new system.
  • Create the source code comparison validation test, and validate that all MaterialX files in the test suite pass. They should, we didn't change any code yet!

Now we should be at a point where developers can start work updating the new shader generation module with the recommendations above.

Proposed order of work

I think there are some pretty straightforward steps we can take in a specific order, where at the end of each step we should have a system that still passes all the tests.

  • Refactor the existing shader generation system to separate the "create" and "emit" phases.
  • Refactor the existing shader generation system to remove all local storage of MaterialX data model objects.
  • Write an interface class hierarchy that matches the existing MaterialX data model, and update the actual MaterialX data model to inherit from this interface hierarchy, and update the entire shader generation system to accept this new set of classes.
  • Flatten this new interface hierarchy to remove the inheritance. The original MaterialX data model can still retain the inheritance, but each class will inherit directly from the corresponding interface class. Now we have an interface where each method for each object type is concretely only for that specific type.
  • Rewrite the interface to remove the classes in favor of flat stateless functions. This initially means passing the shared pointer as an argument to the function.
  • Given there is no data ownership of the incoming data model inside the shader generation, all the current shared pointer references should be able to be safely transformed to const raw pointers.
  • These pointers can then be refactored into "handles", introducing a more abstract method for clients to be able to reference the respective objects necessary for shader generation.
  • This is probably a good spot, with a number of things expanded and simplified, to inspect the current interface "API" and see if there are any points of simplification.

At this point, we should have a functioning system that is fully tested when backed by the MaterialX data model and would be at a position to experiment with alternate data model providers. Once validated against other providers such as OpenUSD/Hydra, we would consider merging this separate module back into the main branch, staged as a system that can be opted into.

Risks / Considerations

Other than potential loss of development time, if this idea/approach doesn't pan out, I think the risks are fairly low, as all the work happens in a separate branch and in a parallel module to the existing system.

We should be able to deploy this system in a released build of MaterialX, alongside the existing shader generation system, allowing for thorough integration testing. Only at the point we want to remove the current shader generation system would we need to make a breaking change release.

ld-kerley avatar Sep 17 '25 04:09 ld-kerley

I very much like this approach. Some basic comments which I'm not sure make sense since you seem to have already worked out a lot of details which I don't all follow :)

High level items:

  • Very much like usage of the visitor pattern. I would propose to consider to be able to support other visitor pattern implementations such as for validation, serialization, etc when considering this design.
  • Are we consider a graph or tree abstraction for the model? (inheritance approach). From what I read this is not the approach taken?
  • Instead I guess this would be a composition-based approach where each backend provides accessors. A public API of accessors would provide a way for any 3rd party graph to be supportable.

Lower level items:

  • It would be nice if the run-time model was easily serializable. Then it can be saved / transmitted as needed
  • Would you be keeping the concepts of graphs vs nodes. Seems like it is useful to preserve the current encapsulation and even graph hierarchies if it's agreeable.
  • It would be nice if graph referencing would work (e.g. to support this property in UsdShade for instance)

My initial 2-cents :).

kwokcb avatar Sep 17 '25 15:09 kwokcb

@kwokcb - Thanks for the quick feedback. I'm happy elaborate on any of the points - either here or in the meeting tomorrow (hopefully you can make it) - if my thoughts aren't clear above.

The main idea here is to not, at least initially, change the internals of the shader generation system, and only focus on feeding data in via an interface that isn't constrained to the MaterialX data model API. So internally things would largely remain the same, and the shape of the incoming model would be the same (a nodegraph of nodes to represent a material).

Visitor Pattern for other API surfaces - I like this idea, but in order to keep the scope of this work manageable, I'd propose we tackle this as a secondary phase. It's not clear to me if a single Visitor Interface is ideal, or if the different API surfaces might want different visitor interfaces. I would also add the upgrade function as a candidate for "visitation" too.

It had occurred to me that being able to serialize the ShaderGraph/ShaderNode representation could be useful/interesting - again - I would propose we add this later.

Similarly, any other internal changes to the shader gen - I would initially consider out of scope for the first phase of this work. I have been thinking more about the ShaderGraph/ShaderNode representation - and I think I would like to see it more closely mimic the input document initially - and separate out the flattening/instancing of nodegraphs/nodedefs. I think there could be shader gen backends that could take advantage of a non-flattened graph at some point.

ld-kerley avatar Sep 17 '25 16:09 ld-kerley

This is a great proposal! A clean separation and abstraction here is something we've been considering since the early days of the codegen system, but never got around to fully implement. The internal ShaderGraph/ShaderNode model was one step in this direction, but as you note there are still dependencies on the MaterialX document/element model that leaked into this.

I agree a clean separation would benefit a lot of integrations, being able to do codegen directly from other data models. We had precisely this issue back when Bernard and I was at Autodesk, having to move back and forth between internal data and a MaterialX document to do codegen.

I'm not sure I follow all details of the proposed working order, but I think the general steps you propose are spot on.

I also think the proposed approach of creating this in a new MaterialX module is the best way forward, making it an opt in, to avoid breaking changes during development.

I have been thinking more about the ShaderGraph/ShaderNode representation - and I think I would like to see it more closely mimic the input document initially - and separate out the flattening/instancing of nodegraphs/nodedefs. I think there could be shader gen backends that could take advantage of a non-flattened graph at some point.

Yes 100% agree, and there are known issues with the current implementation where not preserving the graph hierarchy leads to bugs/limitations in the generated code. For example, this issue: #1976. This part is maybe something we should look at fixing in the existing MaterialXGenShader module first.

niklasharrysson avatar Sep 26 '25 08:09 niklasharrysson

@niklasharrysson thanks for the feedback - all of this work is really standing on the shoulders of the giants (like yourself and Bernard) that came before!

To be clear, initially the work here proposed intentionally will not change any of the internal functionality of the shader generation system, to ensure that while the interface to provide the data is refactored we don't introduce any new bugs, but also don't fix any existing ones.

Once we have solid confidence in the parity, then we would start to evolve the new system to try and address some of these other issues.

ld-kerley avatar Sep 26 '25 16:09 ld-kerley

Sounds good to keep it as minimal as possible.

I'll just note that changing the interface and removing dependencies on MaterialXCore classes may require significant internal refactoring, since those classes are currently referenced and accessed throughout the implementation. That said, this can be addressed incrementally as the work progresses.

niklasharrysson avatar Sep 28 '25 15:09 niklasharrysson

I haven't looked at this too closely yet - but I was considering that perhaps we might want to keep the MaterialX::Value in the interface. This object isn't implicitly tied to the current XML representation. To keep things clean it might infer creating a new base level module (MaterialXBase?) that contains these lowest level data types and continues to share them between ShaderGeneration and the Core. The File handling code in MaterialXFormat might also need to be investigated, as it's also shared.

The underlying goal is to avoid having to reconstitute the MaterialX XML document, I think we could reuse some of the non-XML based pieces of Core, but we may want to separate them to make the decoupling cleaner.

ld-kerley avatar Sep 28 '25 23:09 ld-kerley

Yes, as you note, Value is used throughout MaterialXGenShader. And then there's the question of NodeDef / Implementation. In order to create the ShaderGraph with ShaderNodes, you need the node definitions in some form, as well as their implementation (shader code or graph).

It depends on how you envision the shadergen "create" phase? The "create" interface should be data model agnostic, so you create nodes with ports and values using API calls on some handle. And I guess an API for registering implementations could be agnostic as well. So then it's only the application side that needs access to NodeDef/Implementation, to feed this API.

Most integrations would probably want to have a MaterialX document loaded to access this for the standard libraries (stdlib, pbrlib, etc.). How is that handled in USD? Is there a separate representation / registry of implementations kept in USD, that could feed such an API directly? Or is there a MaterialX document loaded to access the node definitions?

niklasharrysson avatar Oct 01 '25 14:10 niklasharrysson

I haven't thought too deeply yet - but was sort of thinking that we would include the Nodedef/Implementation/Nodegraph classes in the visitor interface - allowing others to provide their own backing to those constructs.

I haven't started poking at the USD side yet - but yes I can imagine a v0.1 where a lot of these things are still sourced from the MaterialX data library - but having them in the interface allows for integrations to pick and choose how much to use naturally. USD has its own Shader Definiton Registry, which might not contain all the info necessary today, but I think ideally we might investigating bringing the two sides to meet each other somewhere in the middle, over time.

ld-kerley avatar Oct 01 '25 16:10 ld-kerley

Ah cool. I haven't wrapped my head around the interfaces yet, but I’m sure it’ll be clear later.

niklasharrysson avatar Oct 02 '25 08:10 niklasharrysson

One of the main motivations to try and do this work in small chunks - rather than one large PR - is so we can all collaboratively work on this, so interface decision points will be collective decisions, and also not necessarily set in stone either - as the work will be in a feature branch, and so can evolve as we learn more.

@jstone-lucasfilm - I believe you were going to make the branch for this - did you get a chance to do that yet?

ld-kerley avatar Oct 02 '25 15:10 ld-kerley

@ld-kerley Not yet, and I'm hoping to have a chance to focus on this next week, as we're wrapped up in high-priority MaterialX presentations and recordings this week.

jstone-lucasfilm avatar Oct 02 '25 16:10 jstone-lucasfilm