bazel icon indicating copy to clipboard operation
bazel copied to clipboard

FATAL: bazel crashed due to an internal error: UnsatisfiedLinkError nativeIsSymlinkOrJunction

Open jayconrod opened this issue 1 year ago • 9 comments

Description of the bug:

When running bazel --output_user_root=C:\_b clean --expunge in our Windows CI environment, we get the following error.

java.lang.UnsatisfiedLinkError: 'int com.google.devtools.build.lib.windows.WindowsFileOperations.nativeIsSymlinkOrJunction(java.lang.String, boolean[], java.lang.String[])'
	at com.google.devtools.build.lib.windows.WindowsFileOperations.nativeIsSymlinkOrJunction(Native Method)
	at com.google.devtools.build.lib.windows.WindowsFileOperations.isSymlinkOrJunction(WindowsFileOperations.java:141)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.isSymlinkOrJunction(WindowsFileSystem.java:275)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.fileIsSymbolicLink(WindowsFileSystem.java:135)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.stat(WindowsFileSystem.java:158)
	at com.google.devtools.build.lib.vfs.FileSystem.statNullable(FileSystem.java:525)
	at com.google.devtools.build.lib.vfs.FileSystem.readdir(FileSystem.java:646)
	at com.google.devtools.build.lib.vfs.Path.readdir(Path.java:268)
	at com.google.devtools.build.lib.exec.BinTools.scanDirectoryRecursively(BinTools.java:132)
	at com.google.devtools.build.lib.exec.BinTools.forProduction(BinTools.java:80)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.newRuntime(BlazeRuntime.java:1299)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.serverMain(BlazeRuntime.java:1024)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:769)
	at com.google.devtools.build.lib.bazel.Bazel.main(Bazel.java:95)

Oddly, our regular bazel test command succeeds, when run immediately after. The only difference in configuration I can think of is that the bazel test command is invoked with --nosystem_rc. Our system rc file just contains startup --output_user_root=C:\\_b, so I don't think it makes a difference for this purpose.

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

The command that fails is:

bazel --output_user_root=C:\_b clean --expunge

I'm not sure what causes this. I've seen it happen in our CI environment, but not when that command is run directly from a shell. I couldn't reproduce it on a dev machine.

Which operating system are you running Bazel on?

Windows Server 2022, on x86_64

What is the output of bazel info release?

release 8.0.0-pre.20231030.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

This is via Bazelisk v1.19.0.

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

n/a

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Not sure. We first saw this about a week after we upgraded from 7.0.0-pre.20230926.1 to 8.0.0-pre.20231030.2.

Have you found anything relevant by searching the web?

#19250 has the same error message, but that happened on ARM64 whereas I'm seeing this on x86_64. It also seems like that was more consistently reproducible.

Any other information, logs, or outputs that you want to share?

cc @meteorcloudy in case you might know something from #19250 which would affect x86_64 too.

jayconrod avatar Dec 26 '23 23:12 jayconrod

I couldn't reproduce this issue on a Windows VM.

pcloudy@BAZEL-WINDOWS-P C:\Users\pcloudy\workdir\bazel>bazel version
Bazelisk version: v1.16.0
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
Build label: 8.0.0-pre.20231030.2
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Tue Nov 14 23:02:05 2023 (1700002925)
Build timestamp: 1700002925
Build timestamp as int: 1700002925

pcloudy@BAZEL-WINDOWS-P C:\Users\pcloudy\workdir\bazel>bazel --output_user_root=C:\_b --nosystem_rc clean --expunge
Starting local Bazel server and connecting to it...
INFO: Starting clean.

Anything else useful for debugging? Does it still happen with USE_BAZEL_VERSION=last_green?

meteorcloudy avatar Jan 02 '24 14:01 meteorcloudy

I tried setting USE_BAZEL_VERSION=last_green, still crashes. I noticed bazel version also crashes in this configuration, so it's not an issue with bazel clean specifically.

As I mentioned earlier, we do have a script that runs bazel test successfully. It's in a different configuration, but I can go through and figure out what the difference is. I'm guessing this is a misconfiguration on my side.

What would cause this error though? I'm guessing Bazel needs to extract its JNI DLL and then fails to load that, but why might that happen?

Here's the output from bazel version

WARNING: The following rc files are no longer being read, please transfer their contents or import their path into one of the standard rc files:
c:\programdata\bazel.bazelrc
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Reading 'startup' options from c:\actionsrunner\_work\engflow\engflow\.bazelrc: --host_jvm_args=-DBAZEL_TRACK_SOURCE_DIRECTORIES=1 --host_jvm_args=-XX:InitialRAMPercentage=25 --host_jvm_args=-XX:MaxRAMPercentage=70
Server crashed during startup. Now printing c:\_b\bpc5uood\server\jvm.out
OpenJDK 64-Bit Server VM warning: Options -Xverify:none and -noverify were deprecated in JDK 13 and will likely be removed in a future release.
FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.UnsatisfiedLinkError: 'int com.google.devtools.build.lib.windows.WindowsFileOperations.nativeIsSymlinkOrJunction(java.lang.String, boolean[*** java.lang.String[***)'
	at com.google.devtools.build.lib.windows.WindowsFileOperations.nativeIsSymlinkOrJunction(Native Method)
	at com.google.devtools.build.lib.windows.WindowsFileOperations.isSymlinkOrJunction(WindowsFileOperations.java:141)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.isSymlinkOrJunction(WindowsFileSystem.java:275)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.fileIsSymbolicLink(WindowsFileSystem.java:135)
	at com.google.devtools.build.lib.windows.WindowsFileSystem.stat(WindowsFileSystem.java:158)
	at com.google.devtools.build.lib.vfs.FileSystem.statNullable(FileSystem.java:5[25](https://github.com/EngFlow/engflow/actions/runs/7388647371/job/20099887654#step:3:25))
	at com.google.devtools.build.lib.vfs.FileSystem.readdir(FileSystem.java:646)
	at com.google.devtools.build.lib.vfs.Path.readdir(Path.java:[26](https://github.com/EngFlow/engflow/actions/runs/7388647371/job/20099887654#step:3:26)8)
	at com.google.devtools.build.lib.exec.BinTools.scanDirectoryRecursively(BinTools.java:132)
	at com.google.devtools.build.lib.exec.BinTools.forProduction(BinTools.java:80)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.newRuntime(BlazeRuntime.java:1302)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.serverMain(BlazeRuntime.java:10[27](https://github.com/EngFlow/engflow/actions/runs/7388647371/job/20099887654#step:3:27))
	at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:771)
	at com.google.devtools.build.lib.bazel.Bazel.main(Bazel.java:95)

jayconrod avatar Jan 02 '24 17:01 jayconrod

The DLL where the nativeIsSymlinkOrJunction method is defined is loaded elsewhere (specifically here). If it hasn't been loaded by the time we get to that code, it's too late. But, for reasons, a failure to load the JNI doesn't produce an error, it only prints a warning to the server log.

Is there anything in the server log (cat $(bazel info server_log)) indicating that the JNI wasn't successfully loaded?

tjgq avatar Jan 02 '24 17:01 tjgq

Thanks @tjgq, I didn't know where to look for the server log. bazel info also crashed, but it was easy enough to find based on other output.

The root cause is that our CI script deletes not only the contents of $TEMP but the directory itself, so Bazel fails to extract the JNI DLL. The stack is below.

240102 18:51:06.927:WT 1 [com.google.devtools.build.lib.jni.JniLoader.*** Failed to load JNI library from resource
java.nio.file.NoSuchFileException: C:\Users\ADMINI~1\AppData\Local\Temp\bazel-jni.15087218107965058839
	at java.base/sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
	at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
	at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
	at java.base/sun.nio.fs.WindowsFileSystemProvider.createDirectory(Unknown Source)
	at java.base/java.nio.file.Files.createDirectory(Unknown Source)
	at java.base/java.nio.file.TempFileHelper.create(Unknown Source)
	at java.base/java.nio.file.TempFileHelper.createTempDirectory(Unknown Source)
	at java.base/java.nio.file.Files.createTempDirectory(Unknown Source)
	at com.google.devtools.build.lib.jni.JniLoader.loadLibrary(JniLoader.java:91)
	at com.google.devtools.build.lib.jni.JniLoader.(JniLoader.java:60)
	at com.google.devtools.build.lib.platform.SystemSuspensionModule.(SystemSuspensionModule.java:30)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unknown Source)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.ensureClassInitialized(Unknown Source)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newConstructorAccessor(Unknown Source)
	at java.base/jdk.internal.reflect.ReflectionFactory.newConstructorAccessor(Unknown Source)
	at java.base/java.lang.reflect.Constructor.acquireConstructorAccessor(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Unknown Source)
	at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.createModules(BlazeRuntime.java:[78](https://github.com/EngFlow/engflow/actions/runs/7389414975/job/20102203579#step:3:78)5)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:761)
	at com.google.devtools.build.lib.bazel.Bazel.main(Bazel.java:[95](https://github.com/EngFlow/engflow/actions/runs/7389414975/job/20102203579#step:3:95))

It's reasonable for Bazel to assume the temp directory exists and crash if it doesn't, so perhaps this issue can be closed without action. However, I'd love a more helpful error message. If the server crashes on startup for whatever reason, it would be great for the client to print that log or tell the user where to find it.

jayconrod avatar Jan 02 '24 19:01 jayconrod

Yes, we should have failed earlier with the more informative error message (and not expect users to go look for it in the log). I have a pending change to do so.

tjgq avatar Jan 02 '24 21:01 tjgq

Yes, we should have failed earlier with the more informative error message (and not expect users to go look for it in the log). I have a pending change to do so.

See https://github.com/bazelbuild/bazel/commit/2d93597df29eacb6a9f1918c9826ea44fb7c30ab.

tjgq avatar Jan 03 '24 10:01 tjgq

FWIW, I had a very similar issue.

I had the following error message pop up:

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.UnsatisfiedLinkError: 'void com.google.devtools.build.lib.profiler.SystemNetworkStats.getNetIfAddrsNative(java.util.List)'
	at com.google.devtools.build.lib.profiler.SystemNetworkStats.getNetIfAddrsNative(Native Method)
	at com.google.devtools.build.lib.profiler.SystemNetworkStats.getNetIfAddrs(SystemNetworkStats.java:118)
	at com.google.devtools.build.lib.profiler.NetworkMetricsCollector.getLocalLoopbackInterfaces(NetworkMetricsCollector.java:127)
	at com.google.devtools.build.lib.profiler.NetworkMetricsCollector.collectSystemNetworkUsages(NetworkMetricsCollector.java:70)
	at com.google.devtools.build.lib.profiler.CollectLocalResourceUsage$Collector.run(CollectLocalResourceUsage.java:259)

Upon following @tjgq's instructions of opening the server log, I found the root cause to be:

240529 00:16:54.381:I 1 [com.google.devtools.build.lib.analysis.BlazeVersionInfo.logVersionInfo] Bazel version info: Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Thu Jan 01 00:00:00 1970 (0)
Build timestamp: Thu Jan 01 00:00:00 1970 (0)
Build timestamp as int: 0
240529 00:16:54.586:WT 1 [com.google.devtools.build.lib.jni.JniLoader.<clinit>] Failed to load JNI library
java.lang.UnsatisfiedLinkError: /tmp/bazel-jni.1234945722920452513/libunix_jni.so: libstdc++.so.6: cannot open shared object file: No such file or directory
	at java.base/jdk.internal.loader.NativeLibraries.load(Native Method)
	at java.base/jdk.internal.loader.NativeLibraries$NativeLibraryImpl.open(NativeLibraries.java:350)
	at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:216)
	at java.base/jdk.internal.loader.NativeLibraries.loadLibrary(NativeLibraries.java:158)
	at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2399)
	at java.base/java.lang.Runtime.load0(Runtime.java:852)
	at java.base/java.lang.System.load(System.java:2030)
	at com.google.devtools.build.lib.jni.JniLoader.loadLibrary(JniLoader.java:104)
	at com.google.devtools.build.lib.jni.JniLoader.<clinit>(JniLoader.java:45)
	at com.google.devtools.build.lib.platform.SystemSuspensionModule.<clinit>(SystemSuspensionModule.java:30)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized0(Native Method)
	at java.base/jdk.internal.misc.Unsafe.ensureClassInitialized(Unsafe.java:1160)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.ensureClassInitialized(MethodHandleAccessorFactory.java:340)
	at java.base/jdk.internal.reflect.MethodHandleAccessorFactory.newConstructorAccessor(MethodHandleAccessorFactory.java:103)
	at java.base/jdk.internal.reflect.ReflectionFactory.newConstructorAccessor(ReflectionFactory.java:173)
	at java.base/java.lang.reflect.Constructor.acquireConstructorAccessor(Constructor.java:549)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.createModules(BlazeRuntime.java:799)
	at com.google.devtools.build.lib.runtime.BlazeRuntime.main(BlazeRuntime.java:775)
	at com.google.devtools.build.lib.bazel.Bazel.main(Bazel.java:103)

Should we treat the failure to load libunix_jni.so as an error too? One of the problems with not treating it as an error is that bazel info server_log was also crashing bazel for me :grimacing: .

matts1 avatar May 29 '24 00:05 matts1

Which Bazel version did you use? This specific error should be fixed by https://github.com/bazelbuild/bazel/commit/388f6870d666ec2b8e766a8164d3fe01a6bf97db where it checks whether the JNI was successfully loaded before calling into native methods.

coeuvre avatar May 29 '24 08:05 coeuvre

I was using last_green from a few days ago. I can confirm that when I sync to head your fix appears to work for my specific issue.

matts1 avatar May 29 '24 22:05 matts1