diagnostics icon indicating copy to clipboard operation
diagnostics copied to clipboard

When CLR hasn't been initialized in the process, SOS should mention that when commands fail

Open TheJCAB opened this issue 4 years ago • 1 comments

This came up during an investigation of SOS-driven debugging of Time Travel Debugging traces. All SOS commands were failing with varied error messages, but the cause for all failures was ultimately the same. GetThreadStoreData failed because the CLR was loaded but not yet initialized in the debugged process.

Before displaying a specific error for a given command, SOS should verify the current status of the relevant process and thread, and display a better, common message if they are not fully initialized.

These are the sorts of errors that were reported, all having ultimately the same root cause of CLR being uninitialized:

0:004> !clrstack OS Thread Id: 0x4304 (4) Unable to walk the managed stack. The current thread is likely not a managed thread. You can run !threads to get a list of managed threads in the process Failed to start stack walk: 80070057 0:004> !dumpheap -stat Error requesting GC Heap data Unable to build snapshot of the garbage collector state 0:004> !threads Failed to request ThreadStore 0:004> !mthreads No supported CLR version found. WARNING: Unable to register for CLR module notifications

TheJCAB avatar Apr 20 '21 17:04 TheJCAB

One note here: We always expect ThreadStore data to be placed into well-formed CLR dumps (even down to triage dumps). This means that if the dac request for ThreadStoreData fails we are in one of two situations:

  1. The runtime was not initialized. This could be from calling LoadLibrary("clr.dll"); and not doing anythign with it, for example, or even just that the process crashed or was stopped before we got through enough of the initialization of CLR to set ThreadStore::s_pThreadStore.
  2. The crash dump was taken without calling into our EnumMemRegions helper.

Adding a helper function to SOS which calls GetThreadStoreData and then printings a message if it fails with the above info should help eliminate confusion.

leculver avatar Apr 20 '21 19:04 leculver