testfx icon indicating copy to clipboard operation
testfx copied to clipboard

Microsoft.Testing.Extensions.HangDump does not work on macOS or Linux in native AOT mode

Open martincostello opened this issue 1 year ago • 32 comments

Describe the bug

Following this tip https://github.com/microsoft/testfx/issues/3095#issuecomment-2166002542 I added the Microsoft.Testing.Extensions.HangDump package to a native AoT test project of mine to try and diagnose a hanging test (which in my case I can only repro on Linux).

Upon making the changes https://github.com/martincostello/alexa-london-travel/pull/1298 and running the CI, the tests fail with a unique exception message each on macOS and Linux as shown below. I can also repo this outside of GitHub Actions CI with WSL.

macOS

Unhandled Exception: System.ArgumentOutOfRangeException: The path '/var/folders/dm/88b38gj92jj53dgxdsm12qf00000gn/T/hangdumpgeneratorpipename.240706229_b5437ec2c2144a679172c9d6fe2a036d/.p' is of an invalid length for use with domain sockets on this platform.  The length must be between 1 and 104 characters, inclusive. (Parameter 'path')
Actual value was /var/folders/dm/88b38gj92jj53dgxdsm12qf00000gn/T/hangdumpgeneratorpipename.240706229_b5437ec2c2144a679172c9d6fe2a036d/.p.
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at Microsoft.Testing.Platform.Hosts.ConsoleTestHost.<InternalRunAsync>d__9.MoveNext() + 0x710
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at Microsoft.Testing.Platform.Hosts.ConsoleTestHost.<InternalRunAsync>d__9.MoveNext() + 0xfd0
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
   at Microsoft.Testing.Platform.Hosts.CommonTestHost.<RunAsync>d__6.MoveNext() + 0x2d4
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at Microsoft.Testing.Platform.Hosts.CommonTestHost.<RunAsync>d__6.MoveNext() + 0x670
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
   at Microsoft.Testing.Platform.Hosts.TestHostControlledHost.<RunAsync>d__4.MoveNext() + 0xec
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
   at Microsoft.Testing.Platform.Builder.TestApplication.<RunAsync>d__17.MoveNext() + 0xc4
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
   at Microsoft.Testing.Platform.Helpers.TaskExtensions.<>c.<<TimeoutAfterAsync>b__2_0>d.MoveNext() + 0xa8
   at TestingPlatformEntryPoint.<Main>d__0.MoveNext() + 0x284
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
   at TestingPlatformEntryPoint.<Main>(String[] args) + 0x3c
   at LondonTravel.Skill!<BaseAddress>+0xae7ca8
Unhandled Exception: System.InvalidOperationException: Unexpected state in file '/_/src/Microsoft.Testing.Extensions.HangDump/HangDumpProcessLifetimeHandler.cs' at line '185'
   at Microsoft.Testing.Platform.Helpers.ApplicationStateGuard.Ensure(Boolean, String, Int32) + 0xb0
   at Microsoft.Testing.Extensions.Diagnostics.HangDumpProcessLifetimeHandler.<OnTestHostProcessStartedAsync>d__42.MoveNext() + 0x6c
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
   at Microsoft.Testing.Platform.Hosts.TestHostControllersTestHost.<InternalRunAsync>d__21.MoveNext() + 0x238c
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at Microsoft.Testing.Platform.Hosts.TestHostControllersTestHost.<InternalRunAsync>d__21.MoveNext() + 0x2d48
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
   at Microsoft.Testing.Platform.Hosts.CommonTestHost.<RunAsync>d__6.MoveNext() + 0x2d4
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at Microsoft.Testing.Platform.Hosts.CommonTestHost.<RunAsync>d__6.MoveNext() + 0x670
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
   at Microsoft.Testing.Platform.Builder.TestApplication.<RunAsync>d__17.MoveNext() + 0xc4
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
   at TestingPlatformEntryPoint.<Main>d__0.MoveNext() + 0x284
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x24
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0x100
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x68
   at TestingPlatformEntryPoint.<Main>(String[] args) + 0x3c
   at LondonTravel.Skill!<BaseAddress>+0xae7ca8

Linux

Unhandled Exception: System.IO.IOException: Broken pipe
 ---> System.Net.Sockets.SocketException (32): Broken pipe
   at System.Exception.SetCurrentStackTrace() + 0x63
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.SetCurrentStackTrace(Exception) + 0x18
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.CreateException(SocketError, Boolean) + 0x72
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.SendAsync(Socket, CancellationToken) + 0xe1
   at System.Net.Sockets.Socket.SendAsync(ReadOnlyMemory`1, SocketFlags, CancellationToken) + 0xea
   at System.IO.Pipes.PipeStream.<WriteAsyncCore>d__83.MoveNext() + 0x93
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine&) + 0x42
   at System.IO.Pipes.PipeStream.WriteAsyncCore(ReadOnlyMemory`1, CancellationToken) + 0x4a
   at System.IO.Pipes.PipeStream.WriteAsync(ReadOnlyMemory`1, CancellationToken) + 0xb4
   at Microsoft.Testing.Platform.IPC.NamedPipeClient.<RequestReplyAsync>d__13`2.MoveNext() + 0x8fb
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine&) + 0x5e
   at Microsoft.Testing.Platform.IPC.NamedPipeClient.RequestReplyAsync[TRequest,TResponse](TRequest, CancellationToken) + 0x63
   at Microsoft.Testing.Extensions.Diagnostics.HangDumpActivityIndicator.<OnTestSessionFinishingAsync>d__33.MoveNext() + 0x13f
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine&) + 0x42
   at Microsoft.Testing.Extensions.Diagnostics.HangDumpActivityIndicator.OnTestSessionFinishingAsync(SessionUid, CancellationToken) + 0x35
   at Microsoft.Testing.Platform.Hosts.CommonTestHost.<NotifyTestSessionEndAsync>d__12.MoveNext() + 0x176
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext, ContextCallback, Object) + 0x8c
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread) + 0x66
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox, Boolean) + 0xa7
   at System.Threading.Tasks.Task.RunContinuations(Object) + 0xa8
   at System.Threading.Tasks.Task`1.TrySetResult(TResult) + 0x88
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(Task`1, TResult) + 0x39
   at Microsoft.Testing.Platform.Messages.MessageBusProxy.<DrainDataAsync>d__6.MoveNext() + 0xef
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext, ContextCallback, Object) + 0x8c
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread) + 0x66
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox, Boolean) + 0xa7
   at System.Threading.Tasks.Task.RunContinuations(Object) + 0xa8
   at System.Threading.Tasks.Task`1.TrySetResult(TResult) + 0x88
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(Task`1, TResult) + 0x39
   at Microsoft.Testing.Platform.Messages.AsynchronousMessageBus.<DrainDataAsync>d__17.MoveNext() + 0x8a3
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext, ContextCallback, Object) + 0x8c
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread) + 0x66
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox, Boolean) + 0xa7
   at System.Threading.Tasks.Task.RunContinuations(Object) + 0xa8
   at System.Threading.Tasks.Task`1.TrySetResult(TResult) + 0x86
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(Task`1, TResult) + 0x38
   at Microsoft.Testing.Platform.Messages.AsyncConsumerDataProcessor.<DrainDataAsync>d__14.MoveNext() + 0x329
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext, ContextCallback, Object) + 0x8c
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread) + 0x66
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(IAsyncStateMachineBox, Boolean) + 0xa7
   at System.Threading.Tasks.Task.RunContinuations(Object) + 0xa8
   at System.Threading.Tasks.Task.TrySetResult() + 0x6f
   at System.Threading.Tasks.Task.DelayPromise.CompleteTimedOut() + 0x13
   at System.Threading.TimerQueueTimer.Fire(Boolean) + 0x60
   at System.Threading.TimerQueue.FireNextTimers() + 0x23b
   at System.Threading.ThreadPoolWorkQueue.Dispatch() + 0x2d1
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() + 0x15a
   at System.Threading.Thread.StartThread(IntPtr) + 0xee
   at System.Threading.Thread.ThreadEntryPoint(IntPtr) + 0x19
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
   at System.IO.Pipes.PipeStream.<WriteAsyncCore>d__83.MoveNext() + 0x16b
   --- End of inner exception stack trace ---
   at System.IO.Pipes.PipeStream.<WriteAsyncCore>d__83.MoveNext() + 0x23e
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
   at Microsoft.Testing.Platform.IPC.NamedPipeClient.<RequestReplyAsync>d__13`2.MoveNext() + 0x94c
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
   at Microsoft.Testing.Platform.Helpers.TaskExtensions.<>c.<<TimeoutAfterAsync>b__2_0>d.MoveNext() + 0x96
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
   at Microsoft.Testing.Platform.Hosts.CommonTestHost.<ExecuteRequestAsync>d__8.MoveNext() + 0x318
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
   at Microsoft.Testing.Platform.Hosts.ConsoleTestHost.<InternalRunAsync>d__9.MoveNext() + 0x7b6
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at Microsoft.Testing.Platform.Hosts.ConsoleTestHost.<InternalRunAsync>d__9.MoveNext() + 0x1094
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
   at Microsoft.Testing.Platform.Hosts.CommonTestHost.<RunAsync>d__6.MoveNext() + 0x2ae
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at Microsoft.Testing.Platform.Hosts.CommonTestHost.<RunAsync>d__6.MoveNext() + 0x5f3
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
   at Microsoft.Testing.Platform.Hosts.TestHostControlledHost.<RunAsync>d__4.MoveNext() + 0xd9
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
   at Microsoft.Testing.Platform.Builder.TestApplication.<RunAsync>d__17.MoveNext() + 0xaf
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
   at TestingPlatformEntryPoint.<Main>d__0.MoveNext() + 0x244
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
   at TestingPlatformEntryPoint.<Main>(String[] args) + 0x2e
   at LondonTravel.Skill!<BaseAddress>+0xa10b7c

Steps To Reproduce

  1. Clone https://github.com/martincostello/alexa-london-travel/pull/1298/commits/b93a00a35e064ab584a45dcdb6b520ee9fdcfa59 onto a Linux or macOS machine.
  2. Run ./build.ps1 in the root of the repository.

Expected behavior

Either:

  • The tests pass, or:
  • The tests produce a dump file relating to a hanging test.

Actual behavior

The process exits with an error of either ArgumentOutOfRangeException on macOS or IOException on Linux.

Additional context

In case it was an issue with the file name I was providing, I tried not specifying a file name at all in the hope of it having a default like dotnet test does. In that case, a different failure is observed:

Unhandled Exception: System.InvalidOperationException: Cannot find mutex 'TESTINGPLATFORM_HANGDUMP_MUTEXNAME_d28475404b504b3aacf951ff5509d4c6'
   at Microsoft.Testing.Extensions.Diagnostics.HangDumpProcessLifetimeHandler.<ActivityTimerAsync>d__44.MoveNext() + 0x39c
--- End of stack trace from previous location ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() + 0x1c
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task) + 0xbe
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task, ConfigureAwaitOptions) + 0x4e
   at Microsoft.Testing.Platform.Helpers.SystemTask.<>c__DisplayClass4_0.<RunLongRunning>b__0() + 0x92
   at System.Threading.Thread.StartThread(IntPtr) + 0xee
   at System.Threading.Thread.ThreadEntryPoint(IntPtr) + 0x19

martincostello avatar Jun 13 '24 16:06 martincostello

@MarcoRossignoli I think it's another bug of pipe name under non-windows, could you check it please?

Evangelink avatar Jun 14 '24 15:06 Evangelink

Thanks @martincostello for reporting it, I'll check the length name issue, anyway I don't know if hang/crash works for native aot version.

@brianrob @tommcdon we're using the environment variable for the crash dump https://learn.microsoft.com/en-us/dotnet/core/diagnostics/collect-dumps-crash and the Microsoft.Diagnostics.NETCore.Client to take a dump in case of hang, is it working in native aot mode?

MarcoRossignoli avatar Jun 14 '24 19:06 MarcoRossignoli

Sorry I totally forgot about this ticket, will work on it this week!

Evangelink avatar Jun 25 '24 10:06 Evangelink

@martincostello Would you be able to test the fix from our preview feed? I tried cloning your repo to test it out but running build.ps1 on Ubuntu, I get

  LondonTravel.Skill failed with 2 error(s) (24.3s) → artifacts/bin/LondonTravel.Skill/release_linux-arm64/bootstrap.dll
    clang : error : linker command failed with exit code 1 (use -v to see invocation)
    /home/amaury/.nuget/packages/microsoft.dotnet.ilcompiler/8.0.6/build/Microsoft.NETCore.Native.targets(366,5): error MSB3073: The command ""clang" "/home/amaury/alexa-london-travel/artifacts/obj/LondonTravel.Skill/release_linux-arm64/native/bootstrap.o" -o "/home/amaury/alexa-london-travel/artifacts/bin/LondonTravel.Skill/release_linux-arm64/native/bootstrap" -Wl,--version-script=/home/amaury/alexa-london-travel/artifacts/obj/LondonTravel.Skill/release_linux-arm64/native/bootstrap.exports -Wl,--export-dynamic -gz=zlib -fuse-ld=bfd /home/amaury/.nuget/packages/runtime.linux-arm64.microsoft.dotnet.ilcompiler/8.0.6/sdk/libbootstrapper.o /home/amaury/.nuget/packages/runtime.linux-arm64.microsoft.dotnet.ilcompiler/8.0.6/sdk/libRuntime.WorkstationGC.a /home/amaury/.nuget/packages/runtime.linux-arm64.microsoft.dotnet.ilcompiler/8.0.6/sdk/libeventpipe-disabled.a /home/amaury/.nuget/packages/runtime.linux-arm64.microsoft.dotnet.ilcompiler/8.0.6/sdk/libstdc++compat.a /home/amaury/.nuget/packages/runtime.linux-arm64.microsoft.dotnet.ilcompiler/8.0.6/framework/libSystem.Native.a /home/amaury/.nuget/packages/runtime.linux-arm64.microsoft.dotnet.ilcompiler/8.0.6/framework/libSystem.IO.Compression.Native.a /home/amaury/.nuget/packages/runtime.linux-arm64.microsoft.dotnet.ilcompiler/8.0.6/framework/libSystem.Net.Security.Native.a /home/amaury/.nuget/packages/runtime.linux-arm64.microsoft.dotnet.ilcompiler/8.0.6/framework/libSystem.Security.Cryptography.Native.OpenSsl.a --target=aarch64-linux-gnu -g -Wl,-rpath,'$ORIGIN' -Wl,--build-id=sha1 -Wl,--as-needed -pthread -ldl -lz -lrt -lm -pie -Wl,-pie -Wl,-z,relro -Wl,-z,now -Wl,--eh-frame-hdr -Wl,--discard-all -Wl,--gc-sections" exited with code 1.

Evangelink avatar Jun 27 '24 13:06 Evangelink

I can try it out tomorrow, but these are the dependency steps the CI runs to install the tooling that native AoT needs on linux: https://github.com/martincostello/alexa-london-travel/blob/9a799b548274c270f8dc61417265d04e5cc19a9d/.github/workflows/build.yml#L65-L74

martincostello avatar Jun 27 '24 13:06 martincostello

It's ok, I can reproduce the error with these lines :) Thanks @martincostello <3

Evangelink avatar Jun 27 '24 14:06 Evangelink

We fixed the length issue of the pipe but looks like there is still an issue. We haven't yet validated if hang dump is working well in Native AOT mode so for now the only recommendation would be to not use it for NAOT

We will work on adding tests for it on our next iteration and will post-back the resutls.

Evangelink avatar Jun 27 '24 15:06 Evangelink

the only recommendation would be to not use it for NAOT

The reason I found this bug in the the first place was I was specifically trying to diagnose a hang that is only happening to me in native AoT 😅

martincostello avatar Jun 27 '24 15:06 martincostello

The reason I found this bug in the the first place was I was specifically trying to diagnose a hang that is only happening to me in native AoT 😅

We're waiting some internal info on how to support hang/crash in native aot mode. We'll let you know soon as possible.

MarcoRossignoli avatar Jun 27 '24 15:06 MarcoRossignoli

The reason I found this bug in the the first place was I was specifically trying to diagnose a hang that is only happening to me in native AoT 😅

@martincostello do you mean that if you run test in "normal mode" everything is good but if you try with native aot it's hanging?

MarcoRossignoli avatar Jun 27 '24 15:06 MarcoRossignoli

@brianrob @tommcdon we're using the environment variable for the crash dump https://learn.microsoft.com/en-us/dotnet/core/diagnostics/collect-dumps-crash and the Microsoft.Diagnostics.NETCore.Client to take a dump in case of hang, is it working in native aot mode?

Sorry for the delayed response! Environment variable enabled crash dumps and dotnet-dump adhoc dump collection is possible with NativeAOT if we include a copy of .NET's createdump tool along with the application. The steps needed to collect dumps with NativeAOT are included in https://github.com/dotnet/diagnostics/issues/4150, copy/pasted here:

In .NET 8, using createdump for native AOT applications require some manual steps:

  • Need the C++ runtime
  • Need access to the 8.0 .NET core version of createdump. The .NET core version of createdump was modified to work with native AOT applications.

cc @LakshanF @agocke

tommcdon avatar Jun 27 '24 19:06 tommcdon

do you mean that if you run test in "normal mode" everything is good but if you try with native aot it's hanging?

Exactly that.

martincostello avatar Jun 28 '24 05:06 martincostello

@martincostello - if you publish the tests for AOT do you get any trim/AOT warnings?

vitek-karas avatar Jun 28 '24 12:06 vitek-karas

Yes, there's a few: https://github.com/martincostello/alexa-london-travel/actions/runs/9710114339/job/26800478379?pr=1298#step:5:208

martincostello avatar Jun 28 '24 12:06 martincostello

Those are not warnings - just info messages. It seems there are no warnings, unless they're suppressed by something.

vitek-karas avatar Jun 28 '24 12:06 vitek-karas

@vitek-karas I haven't seen any warning when building locally. I will double check the configuration to ensure the analyzers are enabled correctly.

Evangelink avatar Jun 28 '24 12:06 Evangelink

@martincostello Hi martin, for debugging Native AOT apps in particular, you might find native tooling better. For instance, you could use GDB or LLDB to attach to the running process and dump the stack directly. As long as symbols are on the machine, you shouldn't need any special technology to do basic investigations. The native compiler should be able to understand the native code.

agocke avatar Jun 28 '24 17:06 agocke

Specifically I only seem to get the hangs in the native AoT tests themselves.

The AoT application itself is fine, and I don't get the hangs in normal xunit tests, or on Windows or macOS under native AoT. It seems to be exclusively my native AoT tests on Linux, and then only sometimes (and those tests are 99% copy-paste from existing non-AoT tests for xunit).

It's mostly just a minor annoyance (a few retries and it'll go away), but I'd like to get to the bottom of it to know if it's an issue in my code, or in the native AoT test SDK itself.

Using the built-in tooling seemed like the easiest way to grab a file to inspect to find what was hanging from my CI (which is where I experience the issue), but that ultimately lead me to opening this issue as I fell at the first hurdle.

martincostello avatar Jun 29 '24 08:06 martincostello

I assume your Native aot tests are also Native aot? Or are they regular coreclr?

agocke avatar Jun 30 '24 14:06 agocke

They use the native AoT support in MSTest.TestFramework.

martincostello avatar Jun 30 '24 15:06 martincostello

Got it. Yup, in that case I would recommend trying to attach with LLDB or GDB. While the dotnet-dump tools have some limited support for Native AOT, the native tools may actually be better in this case.

agocke avatar Jul 01 '24 20:07 agocke

I might try that, but I'll have to learn LLDB/GDB as I've never used them before. That depends on me being able to reproduce it locally - if it only happens in CI then that's basically a non-starter.

martincostello avatar Jul 02 '24 08:07 martincostello

Agreed. I’ll leave this issue open. Unfortunately the person who worked on createdump support with Native aot is out on vacation right now, so it might be a bit before we have a more detailed fix.

agocke avatar Jul 02 '24 14:07 agocke

OK, I've investigated and have some more information. There may be multiple ways that the runtime creates dump files (I haven't surveyed all the possible options) but one is a program called createdump that's bundled with the .NET install. That does seem to work to create full dump files.

What I'm unclear on is: how does the Microsoft.Testing.Extensions.HangDump system work? Does it invoke createdump directly?

agocke avatar Jul 03 '24 01:07 agocke

What I'm unclear on is: how does the Microsoft.Testing.Extensions.HangDump system work? Does it invoke createdump directly?

As reported here https://github.com/microsoft/testfx/issues/3097#issuecomment-2168650810 we use the env vars for the crash dump and the package Microsoft.Diagnostics.NETCore.Client to take a dump of the process in case of hang.

MarcoRossignoli avatar Jul 03 '24 08:07 MarcoRossignoli

OK, I managed to investigate the current state and get a dump out of dotnet-dump, which I think means the same thing can be done using the Microsoft.Diagnostics.NETCore.Client library.

Here are the requirements:

  • The project needs to be built with <EventSourceSupport>true</>. This enables the event pipe that allows another process to talk to the AOT project.
  • The createdump binary needs to be next to the AOT binary. That lives either in the SDK installation, or in the Microsoft.NETCore.App.runtime.<rid> runtime pack.
  • createdump needs to be executable.
  • The process needed to have been started with the environment variable DOTNET_DbgEnableMiniDump=1. I think the NETCore.Client library should handle this.

What I think we need to improve on the Native AOT side is error messages. There's no indication what the cause of some of these failures are.

agocke avatar Jul 10 '24 21:07 agocke

Might also need to set DOTNET_DbgMiniDumpType=4 for the instructions above to get the crash dump.

dotnet-dump could be used to collect dumps as well.

lldb , together with the symbol file, can be used to analyze the dumps collected.

LakshanF avatar Jul 31 '24 18:07 LakshanF

Trying to follow the steps above to use dotnet-dump to generate a dump to see where the hang is coming from, and I keep getting the following error:

> dotnet dump collect -p 27454

Writing full to /home/martin/core_20240806_094936
Write dump failed - HRESULT: 0x80004005.

I've tried making every createdump executable and setting the mentioned environment variable, but that hasn't resolved it.

If I try and use dotnet gcdump it just hangs.

martincostello avatar Aug 06 '24 08:08 martincostello

Playing around in GitHub Actions, I found a clue that's gotten generation working there.

When I set DOTNET_DbgEnableMiniDump=1 at the job level, I get this warning during publish:

DOTNET_DbgEnableMiniDump is set and the createdump binary does not exist: /home/runner/.nuget/packages/runtime.linux-x64.microsoft.dotnet.ilcompiler/8.0.7/tools/createdump

If I manually copy the file from /usr/share/dotnet/shared/Microsoft.NETCore.App/6.0.32/createdump (found via find /usr/share/dotnet -name createdump) to the publish directory, a dump is created successfully.

martincostello avatar Aug 06 '24 09:08 martincostello

The tips got me far enough that I think I root-caused the deadlock as described in #3485, which I've now been able to create a workaround for in my project. The documentation for trying to do this with a native AoT app certainly needs some TLC for those unfamiliar with tools such as lldb (which is how I eventually got hold of the stacks) 😅.

martincostello avatar Aug 06 '24 13:08 martincostello