OpenImageIO icon indicating copy to clipboard operation
OpenImageIO copied to clipboard

[BUG] OpenImageIO crashes without error on some Windows systems

Open kuzynsyna opened this issue 10 months ago • 13 comments

I installed the latest Python 3.13.2 on Windows 10 and OpenImageIO using pip3 install OpenImageIO v3.0.3.1.

On one system, it works correctly, but on another similar system, it crashes when running the same commands on identical files: oiio.ImageBuf(file) I also tested the following:

oiiotool.exe -v --info file However, it also crashes without displaying any error. Out of four Windows 10 systems, it works on two and fails on the other two.

Does the OpenImageIO installation require any additional dependencies or configurations on the system?

kuzynsyna avatar Feb 18 '25 18:02 kuzynsyna

Hey, two out of four systems is better than zero out of four, right? :)

Thanks for creating this issue -- it's been tricky for us to assess and diagnose Windows-related bugs, because apparently very few of us are familiar with / have access to Windows!

On Slack, we recently discovered that the Windows wheels are building and linking a dynamic TIFF library instead of static, which was causing the OIIO bindings to get confused with other shared TIFF libs found at runtime, in certain contexts. (e.g., they'd work with one version of Nuke's interpreter, but not another). So that's something we'll definitely look into.

Aside from that, there is an environment variable you can set that might coerce the bindings / oiiotool into behaving properly.

Would you try setting OPENIMAGEIO_PYTHON_LOAD_DLLS_FROM_PATH to 1 and then running oiiotool.exe in that environment?

If that causes OIIO to suddenly start working where it had previously failed, that will help us pinpoint what we need to do to get our Windows wheels working reliably.

It's very frustrating that the crashes you're experiencing do not produce error messages. We've seen this before with Windows, and it's not clear to me how to force a more verbose output. Is there anything you can tell us about the similarities among / differences between the two system + runtime enviornments + python interpreter versions where the OIIO wheels did work vs where it did not work?

zachlewis avatar Feb 21 '25 15:02 zachlewis

Unfortunately, setting OPENIMAGEIO_PYTHON_LOAD_DLLS_FROM_PATH did not help—there's still no error message or any effect. The system is Windows 10 with the latest updates, and Python and OpenImageIO are the same on all systems. The only difference could be in the additional software installed on these systems.

kuzynsyna avatar Feb 23 '25 19:02 kuzynsyna

This might be due to the MSVC Redist (c runtime et al) package being installed on the 2 machines that work. If you look at the list of installed programs, do you see something like "Microsoft Visual C++ 2015-2022 Redistributable (x64) ..." on the 2 machines that work but not on the machines that don't work?

jessey-git avatar Feb 24 '25 07:02 jessey-git

I compared this, and the same versions were installed on both system versions. I didn't find any differences in this regard.

kuzynsyna avatar Feb 24 '25 08:02 kuzynsyna

Thank you for checking the LOAD_DLLS_FROM_PATH environment variable thing, that crosses one issue off the list...

I don't suppose I could get you to install uv (https://github.com/astral-sh/uv) on the trouble systems...? I'd like to separate out as many confounding variables as possible, and we can use the uvx command to install and run oiiotool --buildinfo in an isolated virtual environment, with an arbitrary CPython interpreter if necessary.

Three tests I'd like to try:

  1. $ uvx --from openimageio oiiotool --buildinfo -- test to see if the OIIO wheels are conflicting with something else in your python environment
  2. $ uvx --python 3.13 --from openimageio oiiotool --buildinfo -- same as above, but will fetch and locally install CPython 3.13 first -- this will allow us to determine if there's a potential conflict with your normal Python interpreter.
  3. $ uvx --from git+https://github.com/academysoftwarefoundation/openimageio.git oiiotool --buildinfo -- will attempt to locally build the OIIO wheels directly from the main branch, instead of using a bdist from pypi.org

zachlewis avatar Feb 24 '25 11:02 zachlewis

I can reproduce the problem on a blank VM here. However, I "solved" it by installing the VS redist package. This is definitely a problem as users shouldn't need to install this package. The wheel should probably be changed to ship with the crt stubs or link the crt statically.

However, if that's not solving op's problem, then I suppose there's another issue at play...

[EDIT] A "working" install: Image

A "broken" install - may or may not be different than this issue here: Image

jessey-git avatar Feb 24 '25 17:02 jessey-git

This does smell like a VC Runtime version difference between the build and runtime environments...

We recently ran into issues with OCIO crashing at initialization time due to runtime ABI differences. These turned out to be due to changes to the ABI that made the std::mutex constructor constexpr, where a critical section initialized in the old non-constexpr constructor was supplanted with zero-initialized memory in the newer constexpr version. As a result, the older version of the runtime was trying to wrangle a now-uninitialized critical section, with predictably catastrophic results.

More detailed breakdown: https://github.com/actions/runner-images/issues/10004#issuecomment-2156109231

So, the first question I would probably ask is: what version of the VC runtime is installed on the runner that produces the OIIO build that's bundled in the Python wheel?

nrusch avatar Feb 25 '25 07:02 nrusch

Installing the latest version of VC Runtime resolved the issue. Thanks, everyone, for your help!

kuzynsyna avatar Feb 25 '25 09:02 kuzynsyna

Thanks for helping to track down what's going on here, everyone.

We recently ran into issues with OCIO crashing at initialization time due to runtime ABI differences. These turned out to be due to changes to the ABI... [a]s a result, the older version of the runtime was trying to wrangle a now-uninitialized critical section, with predictably catastrophic results.

Hmm. Good hunting. What I can tell you is, our GHA workflow for building the wheels was originally basically just copy-pasted from OCIO's wheels workflow, so it's extremely likely that the same VC runtime ABI incompatibilities present in OCIO are also found here.

So, the first question I would probably ask is: what version of the VC runtime is installed on the runner that produces the OIIO build that's bundled in the Python wheel?

Welp, we're using the "windows-2022" runner, the details of which can be found here. I see specifically:

Microsoft.Component.VC.Runtime.UCRTSDK - 17.13.35710.127

The wheel should probably be changed to ship with the crt stubs or link the crt statically.

@lgritz, I know you've experienced problems with linking static dependencies on Windows, which is why we haven't been doing so by default; but I think it might be worth further exploration, to see if we can nail down what specifically causes trouble.

I'm reading that we can globally link the CRT statically in the top-level CMakeLists.txt with:

set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:Debug>")
cmake_policy(SET CMP0091 NEW)  # ensures that CMake obeys the above property

(which corresponds to the /MT and /MTd compiler flags for statically linking the CRT)

Does that sound like a good fix to everyone?

zachlewis avatar Mar 03 '25 14:03 zachlewis

Does that sound like a good fix to everyone?

Ok for release, but I think the license prevents static linking of debug CRT (the distribution thereof really). (See also the linked PR there that only handles release, but leaves debug linking undefined and up to the local buildtime option...)

kmilos avatar Mar 06 '25 08:03 kmilos

@lgritz, I know you've experienced problems with linking static dependencies on Windows, which is why we haven't been doing so by default; but I think it might be worth further exploration, to see if we can nail down what specifically causes trouble.

If you can make it work, go for it. Anything I say about Windows should be viewed through the lens of knowing that I have minimal development experience on Windows (unless you count 80s and early 90s era MSDOS) and literally no current access to any Windows machine other than by pushing to GitHub and trying to read the tea leaves of the CI logs. In other words, never treat my observations of what I was able to make work (or not) on Windows as prescriptive for how it should be done.

Thanks for the observation, @kmilos. I believe that the only distribution of binaries that we do is release mode, so I don't think we're limited by the licensing on the debug CRT.

lgritz avatar Mar 07 '25 18:03 lgritz

Just to be safe, I'll do:

set(CMAKE_MSVC_RUNTIME_LIBRARY "MultiThreaded$<$<CONFIG:Debug>:DebugDLL>" CACHE STRING "")

I'll wrap this into a PR I'm working on for an "OpenImageIO_BDIST" mode similar to "OpenImageIO_CI", but for setting and doing stuff specifically relevant for building binary distributions (like handling of LGPL dependencies, linking zlib-ng, ignoring homebrewed deps, etc.)

zachlewis avatar Mar 10 '25 22:03 zachlewis

We've seen a similar issue regarding the mutex stuff with blender and opentimelineio recently, otio was build with against newer version of the runtime and took down blender, we found that building otio with /D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR sidestepped the issue, so if you don't want to statically link the runtime, this be an option as well.

LazyDodo avatar Mar 14 '25 15:03 LazyDodo