20240610.1.0: still issues related to Visual Studio update with C++ libraries run through Java
Description
GDAL has been affected by https://github.com/actions/runner-images/issues/10004 with the issue with std::mutex and the 20240603 image update. Since 20240610 has been released, I've retried re-enabling GDAL Windows CI testing, and they mostly work, except Java related tests, that run my C++ library through JNI. I suspect the JVM shipped in the image uses a too old VC redist
Cf https://github.com/OSGeo/gdal/actions/runs/9487561784/job/26144477503?pr=10198 for a faulty run
The following tests FAILED:
1 - java_GDALOverviews (Failed)
2 - java_gdalinfo (Failed)
3 - java_ogr2ogr_1 (Failed)
4 - java_ogrinfo_1 (Failed)
5 - java_ogrinfo_2 (Failed)
6 - java_ogr2ogr_2 (Failed)
7 - java_ogr2ogr_3 (Failed)
8 - java_ogrinfo_3 (Failed)
9 - java_OSRTransform (Failed)
10 - java_gdalmajorobject (Failed)
11 - java_GDALTestIO (Failed)
12 - java_GDALTestMultiDim (SEGFAULT)
13 - java_GDALContour (Failed)
15 - java_ogrtindex (Failed)
16 - java_OSRTest (Failed)
9: #
9: # A fatal error has been detected by the Java Runtime Environment:
9: #
9: # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffd926192a0, pid=3372, tid=0x0000000000001464
9: #
9: # JRE version: OpenJDK Runtime Environment (8.0_412-b08) (build 1.8.0_412-b08)
9: # Java VM: OpenJDK 64-Bit Server VM (25.412-b08 mixed mode windows-amd64 compressed oops)
9: # Problematic frame:
9: # C [msvcp140.dll+0x192a0]
9: #
9: # Core dump written. Default location: D:\a\gdal\gdal\build\swig\java\hs_err_pid3372.mdmp
9: #
9: # An error report file with more information is saved as:
9: # D:\a\gdal\gdal\build\swig\java\hs_err_pid3372.log
Platforms affected
- [ ] Azure DevOps
- [X] GitHub Actions - Standard Runners
- [ ] GitHub Actions - Larger Runners
Runner images affected
- [ ] Ubuntu 20.04
- [ ] Ubuntu 22.04
- [ ] Ubuntu 24.04
- [ ] macOS 11
- [ ] macOS 12
- [ ] macOS 13
- [ ] macOS 13 Arm64
- [ ] macOS 14
- [ ] macOS 14 Arm64
- [ ] Windows Server 2019
- [X] Windows Server 2022
Image version and build link
Runner Image Image: windows-2022 Version: 20240610.1.0
Is it regression?
Last good build: https://github.com/OSGeo/gdal/actions/runs/9392254916/job/25869181603?pr=10145 (Version: 20240514.3.0)
Expected behavior
should segfault
Actual behavior
segfaults
Repro steps
Build a C++ library using std::mutex with a JNI interface, and run it with the JVM provided in the image
A workaround I found is to remove msvcp140.dll from "C:/hostedtoolcache/windows/Java_Temurin-Hotspot_jdk" with https://github.com/OSGeo/gdal/pull/10198/commits/95d092d2c59961b7580add8d8736434a6c43e587
@rouault - Thank you for bringing this issue to us. We are looking into this issue, we will update you.
Same issue, even after deployment of runner image version 20240610.1.0 which was supposed to fix the issue: https://github.com/actions/runner-images/issues/10020#issuecomment-2168449045
Even assuming the image get fixed to update all the runtime dlls that are shippen by every application and version of java on the agent, that just means we will build and ship java sdks that will blow up in a similarly spectacular and hard to diagnose way on customers machines who have default unmodified versions of java installed on their machines.
This does not seem like a good compatibility experience for our customers. Would it be better to revert back to the previous version of visual studio?
A workaround I found is to remove msvcp140.dll from "C:/hostedtoolcache/windows/Java_Temurin-Hotspot_jdk" with OSGeo/gdal@95d092d
Since there is no way to ensure all Java users have JVM installations with the latest vcruntime I plan to define _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR instead when building our JNI modules and dependencies. Is this define needed with clangcl as well as msvc?
Is this define needed with clangcl as well as msvc?
I found the answer. Yes it's needed. Longer answer can be found in #10004.
@rouault - I hope your issue got resolved, as i can see there was a workaround been used - by removing msvcp140.dll C++ package from "C:/hostedtoolcache/windows/Java_Temurin-Hotspot_jdk" with https://github.com/OSGeo/gdal/commit/95d092d2c59961b7580add8d8736434a6c43e587 Kindly please update.
@rouault - Closing this issue, thank you for bringing this issue to us. Please feel free to create a new issue if you face any related issues with this windows server 2022 image.
I'm late to the party, but FYI for people getting here in the future: the problem is not limited to Java programs.
I don't use any Java, but because the Java_Temurin-Hotspot_jdk bin directory is earlier in $PATH than the system directories all programs will get their msvcp140.dll from there.
So if you are installing anything on the Windows runner that has been built against MSVC Redist 14.34.31931 or earlier it will fail with this problem.
@jackjansen One thing that I do not understand in above posts or posts in other related issues that mention $PATH: According to the Microsoft documentation, the directories in $PATH are searched last. Especially the system directories come before. Since the VC redistributable installs the DLLs including msvcp140.dll into the system directories (it does, doesn't it?), shouldn't this mean that all applications should pick up the latest msvcp140.dll, and everything on $PATH effectively gets ignored? Am I missing something?
The only case where I can see problems is if an older msvcp140.dll is located in an application directory (i.e. next to the .exe), because the application directory is searched before the system directories. But this will also cause problems only if that application contains or loads code (plugins, Java/Python modules, etc.) which had been built against the newer Microsoft STL which requires the updated msvcp140.dll.
@Sedeniono I think you may be right. In other words: I now think that I my problem was not caused by the incorrect old msvcp140.dll that I found on $PATH, but in stead by another old version of msvcp140.dll that was loaded into my executable earlier. Sorry for the confusion.