WIP: add MDL Compiler and Render
Draft to add the MDL compiler and a renderer to the repo to simplify development and testing. Note, everything can be discussed. I just started with the approach I find most valuable.
- ~pull vcpkg as a submodule and integrate it into cmake using vcpkg manifest mode~
- vcpkg needs to checked out separately, and VCPKG_ROOT has to be defined during cmake configure.
- vcpkg can be user for other dependencies as well: oiio, imgui, etc. independent of MDL
- vcpkg pulls the MDL SDK only when only the MDL compiler for [genmdl] tests is needed
- vcpkg pulls oiio for MDL image plugins, also vulkan, gltf, glslang, etc. for the render application
- there are two cmake options that are disabled at the moment
MATERIALX_BUILD_MDL_COMPILERandMATERIALX_BUILD_MDL_RENDER. If there are enabled cmake configure and generate will compile dependencies which increases the build time drastically. So be aware. If the options are not set, the build time shouldn't increase notably
As an alternative I'm planning to add option to instead download precompiled MDL binaries for faster build times. However, vulkan for instance is probably still pulled via vcpkg ideally.
marking this as PR as WIP to get feedback. note, the openpbr changes are added for testing only and will be removed. I also switched CRT to static for the time being. With the next MDL SDK release we should be able to dynamic CRT.
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: jreichel-nvidia / name: Joachim Reichel (3a4a4d980ec5408aa2369a4644be94fef0573470, 4128cb7f883ba3f662cb1d96afd0ece0fdc27e10, 41ae08e797f8c085fa4d76d4d705a255efd987df, 4900f510b79240cd2e7a14e660b68eeb7cb434a7, 49ece6804b404649e225094b587f4ccc0c393949, a36402637db51d9f534d798404fc11f8c81098fd)
@jstone-lucasfilm @ld-kerley can we merge this MR? Lee has two MRs that effect the MDL code gen and a test run would be nice. And we also have a followup for the [rendermdl] test, that work with an external renderer via cmake variable.
@krohmerNV I agree that we should review and merge your work on MDL testing first, followed by the related PRs that affect MDL shader generation, but my sense is that all of these updates should wait until after the release of MaterialX 1.39.4.
In order to get this work on MDL testing ready to merge, we'll need to address the significant increase in build times, which rises from roughly 12 minutes to more than an hour.
One path forward would be to add a nightly build of MaterialX, with the MDL test suite included in only that build. This approach would also extend to the future inclusion of MaterialX testing with OpenImageIO, which similarly takes too long to be included in our builds that are run for every commit.
About build times: this is why we added caching for vcpkg artefacts. The build time of ~1h in your commit occurs when there are no cache hits (probably because the vcpkg version changed in the last three weeks). With a warm cache the CI run takes 15 minutes (see my dummy commit above).
The images change roughly once per week (https://github.com/actions/runner-images/commits/main/images/ubuntu/Ubuntu2404-Readme.md), which likely includes the vcpkg version. AIUI once per week the first CI run after an image change would experience the longer build times, but not the others. Is that good enough to address the build time increase?
A more thorough CI run in a nightly build might make sense, though. I just wonder the genmdl test with mdlc needs to be distabled in the regular MR runs ...
@jreichel-nvidia I wonder if the presence of the cache just makes our build times more unpredictable, which is potentially a big disadvantage over the nightly build approach. Thinking of a new developer getting started with MaterialX for the first time, their first GitHub Actions run would always take the maximum possible time, since their cache would always be cold, and I'm concerned about the effect that this would have on Dev Days participants and other learners.
We definitely want to integrate your MDL testing work as soon as possible, ideally at the beginning of the development cycle for MaterialX 1.39.5, and I wonder if the best approach for now would be to make MDL builds opt-in through a GitHub Actions matrix flag that defaults to false.
This would allow teams to easily enable MDL builds on demand in the short term, and over time we would implement proper nightly builds that include both MDL and OIIO testing. This would create a framework that would extend naturally to OSL testing as well, where that build process is even longer than the one-hour build processes that we expect for MDL or OIIO.
What are your thoughts?
@jstone-lucasfilm About new developers: The cache is not per account, but per project and shared between MRs. Each CI run should face the same risk of an outdated cache. But the build times are more unpredictable, yes.
My biggest concern with the nightly test is that one does not get the per-MR feedback and might unknowingly break tests. (Plus someone would regularly need to review test results, follow up on failures etc.)
Disabling the test matrix flags would be ok for me if that allows the MR getting merged.
What do you think about this experiment: Give it a try and merge the MR with the flags enabled. If the unpredictable build time becomes problematic you could still decide to disable it. And moving that test to the nightly tests later would also still be an option.
@jreichel-nvidia Is the cache shared between the main repository and all forks of that repository? The case I'm concerned about is a new developer working in a fork of MaterialX, and experiencing a one hour build that discourages them from using GitHub Actions. I've had that experience with other projects with GitHub Actions workflows that are so unwieldy that I don't end up using them at all.
@jstone-lucasfilm Right, I forgot about forks. Their cache is empty on the very first MR.
I guess that means disabling the feature for now, and enabling it only later in the nightly tests. Is the MR fine apart from that (and merging/resolving conflicts)?
Bringing an internal NVIDIA thread back to the main PR, here are the next steps that I would recommend:
- Let's reframe this as a validation step that will occur in a nightly build, and remove all of the infrastructure for CI caching.
- For now, you can just include a
build_mdlflag in CI, along with the build step that it invokes, without setting the flag in the matrix. - Once these changes are made, go ahead and highlight your updated PR in the MaterialX channel of the ASWF Slack, so that we can get the largest section of the community reviewing your work.
- I can only recommend to keep the vcpkg cache. Even if vcpkg is only used in nightly builds, devs will have to test changes for that mode and a quick CI turnaround time is very useful (working on github actions is already enough pain by itself). I can remove caching again, but for CI turnaround time I would suggest to delay that until it is clear that no other larger changes are required.
- There is already a
build_mdl_sdkflag. Is that what you meant? Do you want me to rename it? Still to be disabled of course. (There is also the additionalcmake_configargument. The reason is that its value depends on the OS, but in the matrix I can't reference the variables above. This would be trivial in any reasonable scripting language, but YAML ...)
@jreichel-nvidia My sense is that the most important parts of this changelist are the new capabilities for MDL validation, and in order to get this merged soon, I think it will really help to make the PR as minimal as it can possibly be. We can then write up an independent PR where ideas for CI caching can be independently reviewed and tested across multiple forks of MaterialX, without slowing down the review and integration of this important work.
I've put together a first PR that actually implements a nightly MaterialX build, and my recommendation would be to structure your new MDL test as a parallel of the existing OIIO test, with a new extended_build_mdl flag that mirrors the existing extended_build_oiio:
https://github.com/AcademySoftwareFoundation/MaterialX/pull/2621
We'll focus on getting the nightly build PR merged ASAP, so that you can test it directly with your MDL PR, making sure that the MDL test passes when you manually dispatch the main workflow in your fork of MaterialX.