ai-dev-gallery icon indicating copy to clipboard operation
ai-dev-gallery copied to clipboard

[BUG] App crash starting ONYX image creation

Open rs38 opened this issue 1 month ago • 9 comments

Windows 11 25H2 Version 26200 MP (12 procs) Free ARM 64-bit (AArch64) Product: WinNt, suite: SingleUserTS Personal Edition build lab: 26100.1.arm64fre.ge_release.240331-1435


  •                                                                         *
    
  •                    Exception Analysis                                   *
    
  •                                                                         *
    

ClrmaManagedAnalysis::GetThread 6468 ClrmaThread::Initialize 6468 ClrmaThread::Initialize GetThreadStoreData FAILED 80131c49 ~ClrmaThread ClrmaManagedAnalysis::get_ProviderName ClrmaManagedAnalysis::GetThread ffffffff ClrmaThread::Initialize 6468 ClrmaThread::Initialize GetThreadStoreData FAILED 80131c49 ~ClrmaThread

PROCESS_NAME: AIDevGallery.exe

ERROR_CODE: (NTSTATUS) 0x887a0005 - Die GPU-Ger teinstanz wurde angehalten. Verwenden Sie GetDeviceRemovedReason, um die erforderliche Aktion zu bestimmen.

SYMBOL_NAME: dwmcorei!MilFailFastForHR+74

MODULE_NAME: dwmcorei

IMAGE_NAME: dwmcorei.dll

FAILURE_BUCKET_ID: XAML_887a0005_dwmcorei.dll!MilFailFastForHR

FAILURE_ID_HASH: {9470d22f-30b4-fcb6-a78c-fceb9f05fb51}

rs38 avatar Nov 10 '25 06:11 rs38

The crash log shows 0x887A0005 (DXGI_ERROR_DEVICE_REMOVED), meaning the GPU device was stopped or reset by the system. This usually happens under heavy GPU load or driver issues.

Suggested steps:

  • update your graphics driver and Windows updates.
  • try lowering image resolution or batch size.
  • Test with GPU acceleration disabled (CPU/WARP mode).
  • Close other GPU-intensive apps before running ONYX image creation.

If possible, please share:

  1. Device model and GPU driver version.
  2. Steps to reproduce.
  3. DxDiag or DRED logs for deeper analysis.

weiyuanyue avatar Nov 10 '25 08:11 weiyuanyue

Device model and GPU driver version

HP OmniBook X Laptop 16 GB RAM, more details see dxdiag.txt

Image

Steps to reproduce:

starting AI Gallery App -> chose Image creation; the only avialable config is: Onyx stable diffusion 1.4, no need to upload a pic, instantly whole Laptop freezes for 10 sec. aidevgallery://models/36a56423-1be1-4577-9437-3e68435cada2

DxDiag.txt

DxDiag or DRED logs for deeper analysis.

rs38 avatar Nov 10 '25 11:11 rs38

just re-downloaded it and tried and now got an err msg adressing too low memory resources:

Type: OnnxRuntimeException Message: [ErrorCode:RuntimeException] Exception during initialization: C:__w\1\s\win-onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\FusedGraphKernel.cpp(70)\onnxruntime.DLL!00007FFE8B2EC74C: (caller: 00007FFE8B2EB124) Exception(2) tid(2f90) 8007000E F�r diesen Vorgang sind nicht gen�gend Speicherressourcen verf�gbar.

StackTrace: at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer) at AIDevGallery.Samples.SharedCode.StableDiffusionCode.VaeDecoder.<>c__DisplayClass4_0.<<GetInferenceSession>b__0>d.MoveNext() --- End of stack trace from previous location --- at AIDevGallery.Samples.SharedCode.StableDiffusionCode.VaeDecoder.CreateAsync(StableDiffusionConfig config, String modelPath, WinMlSampleOptions winMlSampleOptions) at AIDevGallery.Samples.SharedCode.StableDiffusionCode.StableDiffusion.InitializeAsync(WinMlSampleOptions winMlSampleOptions) at AIDevGallery.Samples.OpenSourceModels.StableDiffusionImageGeneration.GenerateImage.LoadModelAsync(SampleNavigationParameters sampleParams)

rs38 avatar Nov 10 '25 11:11 rs38

switched to "use CPU" instead of GPU makes it work, slow but no crash.

rs38 avatar Nov 10 '25 11:11 rs38

Thanks for reporting this! the issue seems related to GPU driver and DML compatibility during Stable Diffusion initialization under high memory pressure: 0x887A0005 indicates the GPU device was reset, and 8007000E means insufficient memory during initialization. Your device (ARM-based Snapdragon X Elite, 16GB UMA) may hit peak memory usage when loading SD1.4, causing the driver to reset.

  • Could you try updating your Adreno GPU driver to the latest version (via Windows Update or OEM support). (AFAIK, the latest version is 31.0.112.0)
  • The dxdiag.txt file you shared cannot be opened:-( it doesn't look like a text file right now. Could you re-check it?
  • DML EP has known performance and compatibility limitations, and we are working on improvements.

weiyuanyue avatar Nov 11 '25 03:11 weiyuanyue

https://github.com/user-attachments/files/23452833/DxDiag.txt my attachment is a pretty plain txt file??! here again but zipped: DxDiag.zip

I agree, 16 GB is a bit on the low side but work pretty well on all other stuff with this WinARM Copilot+ Gadget.

the latest drivers at MS are from 2023: https://www.catalog.update.microsoft.com/Search.aspx?q=qualcomm&scol=DateComputed&sdir=desc

latest driver package from HP: 7700.1 Rev.E 426.7 MB 20. Aug. 2025 provides exactly the version I have installed.

found 31.0.121 at Qualcomm, installed but same mem error:

Type: OnnxRuntimeException
Message: [ErrorCode:RuntimeException] Exception during initialization: C:\__w\1\s\win-onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\FusedGraphKernel.cpp(70)\onnxruntime.DLL!00007FFDDDD3C74C: (caller: 00007FFDDDD3B124) Exception(2) tid(4db8) 8007000E F�r diesen Vorgang sind nicht gen�gend Speicherressourcen verf�gbar.

StackTrace:    at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer)
   at AIDevGallery.Samples.SharedCode.StableDiffusionCode.StableDiffusion.<>c__DisplayClass10_0.<<GetInferenceSession>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at AIDevGallery.Samples.SharedCode.StableDiffusionCode.StableDiffusion.InitializeAsync(WinMlSampleOptions winMlSampleOptions)
   at AIDevGallery.Samples.OpenSourceModels.StableDiffusionImageGeneration.GenerateImage.LoadModelAsync(SampleNavigationParameters sampleParams)

rs38 avatar Nov 11 '25 07:11 rs38

Type: OnnxRuntimeException
Message: [ErrorCode:Fail] Failed to finalize QNN graph.
StackTrace:    at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer)
   at AIDevGallery.Samples.SharedCode.StableDiffusionCode.TextProcessing.<>c__DisplayClass5_0.<<GetInferenceSession>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at AIDevGallery.Samples.SharedCode.StableDiffusionCode.TextProcessing.CreateAsync(StableDiffusionConfig config, String tokenizerPath, String encoderPath, WinMlSampleOptions winMlSampleOptions)
   at AIDevGallery.Samples.SharedCode.StableDiffusionCode.StableDiffusion.InitializeAsync(WinMlSampleOptions winMlSampleOptions)
   at AIDevGallery.Samples.OpenSourceModels.StableDiffusionImageGeneration.GenerateImage.LoadModelAsync(SampleNavigationParameters sampleParams)
Type: KeyNotFoundException

trying "compile model" :

Message: The given key '13' was not present in the dictionary.
StackTrace:    at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at Microsoft.ML.OnnxRuntime.OnnxRuntimeException..ctor(ErrorCode errorCode, String message)
   at AIDevGallery.Samples.SharedCode.WinMLHelpers.GetCompiledModel(SessionOptions sessionOptions, String modelPath, String device)
   at AIDevGallery.Samples.SharedCode.StableDiffusionCode.TextProcessing.<>c__DisplayClass5_0.<<GetInferenceSession>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at AIDevGallery.Samples.SharedCode.StableDiffusionCode.TextProcessing.CreateAsync(StableDiffusionConfig config, String tokenizerPath, String encoderPath, WinMlSampleOptions winMlSampleOptions)
   at AIDevGallery.Samples.SharedCode.StableDiffusionCode.StableDiffusion.InitializeAsync(WinMlSampleOptions winMlSampleOptions)
   at AIDevGallery.Samples.OpenSourceModels.StableDiffusionImageGeneration.GenerateImage.LoadModelAsync(SampleNavigationParameters sampleParams)

rs38 avatar Nov 11 '25 07:11 rs38

completely different Laptop (Surface Book 3 15") , 32 GB, NVidia GTX 6 GB,

pretty similar behaviour at image creation, same model, GPU selected, crashes without warning and make system unresponsive for some seconds. Not sure if it makes sense to provide all details as it looks like a general problem? Can try with a 3rd Laptop, 32GB Nvidia RTX later...

FILE_IN_CAB:  AIDevGallery.exe.12152.dmp

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

CONTEXT:  (.ecxr)
rax=0000000000000001 rbx=000000e9e0e7c2e0 rcx=0000000000000007
rdx=00000252698900c0 rsi=00000000ffffffff rdi=000000e9e0e7d0e0
rip=00007ffd58694ace rsp=000000e9e0e7b760 rbp=000000e9e0e7b8c0
 r8=7ffffffffffffffc  r9=000000e9e0962000 r10=00000000000ec940
r11=0000000000000001 r12=000000e9e0e7baf0 r13=0000000000000000
r14=000000e9e0e7c100 r15=000000e9e0e7bac0
iopl=0         nv up ei pl nz na pe nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
ucrtbase!abort+0x4e:
00007ffd`58694ace cd29            int     29h
Resetting default scope

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffd58694ace (ucrtbase!abort+0x000000000000004e)
   ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
  ExceptionFlags: 00000001
NumberParameters: 1
   Parameter[0]: 0000000000000007
Subcode: 0x7 FAST_FAIL_FATAL_APP_EXIT 

PROCESS_NAME:  AIDevGallery.exe

ERROR_CODE: (NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.

EXCEPTION_CODE_STR:  c0000409

EXCEPTION_PARAMETER1:  0000000000000007

STACK_TEXT:  
000000e9`e0e7b760 00007ffd`586b19de     : 000000e9`00000003 00000000`00000003 00000000`ffffffff 00000000`00000000 : ucrtbase!abort+0x4e
000000e9`e0e7b790 00007ffc`962c7fd4     : 000000e9`e0e7c2e0 000000e9`e0e7b8c0 000000e9`e0e7c100 000000e9`e0e7d0e0 : ucrtbase!terminate+0x1e
000000e9`e0e7b7c0 00007ffc`962c87bc     : 00000000`00000001 00000000`e06d7363 00000081`e06d7363 00000000`00000000 : onnxruntime!FindHandler<__FrameHandler4>+0x530
000000e9`e0e7b990 00007ffc`962c8821     : 00007ffc`95620000 000000e9`e0e7d0e0 000000e9`e0e7c2e0 000000e9`e0e7c100 : onnxruntime!__InternalCxxFrameHandler<__FrameHandler4>+0x278
000000e9`e0e7ba30 00007ffc`962c6dd2     : 00007ffc`95620000 000000e9`e0e7d0e0 000000e9`e0e7c2e0 000000e9`e0e7c100 : onnxruntime!__InternalCxxFrameHandlerWrapper<__FrameHandler4>+0x35
000000e9`e0e7ba80 00007ffc`962c40b8     : 000000e9`e0e7ec30 00007ffc`968918a0 000000e9`e0e7d0e0 000000e9`e0e7ec30 : onnxruntime!_CxxFrameHandler4+0xb2
000000e9`e0e7baf0 00007ffd`5afc632f     : 000000e9`e0e7d0e0 000000e9`e0e7c0b0 00000000`00000000 00000000`00000081 : onnxruntime!_GSHandlerCheck_EH4+0x64
000000e9`e0e7bb20 00007ffd`5ae72327     : 000000e9`e0e7d0e0 00007ffc`95620000 00007ffc`9570ca03 00007ffc`969ea014 : ntdll!RtlpExecuteHandlerForException+0xf
000000e9`e0e7bb50 00007ffd`5ae6a961     : 00000000`00000000 000000e9`e0e7cf90 00000000`00000000 000000e9`e0e7d0e0 : ntdll!RtlDispatchException+0x437
000000e9`e0e7c2a0 00007ffd`580b804a     : 00000000`00000201 000000e9`e0e7d1e8 00000000`00000001 00000000`e06d7363 : ntdll!RtlRaiseException+0x221
000000e9`e0e7d0c0 00007ffc`962c642f     : 00007ffc`9570fa10 00000000`00000000 00000000`00000200 000000e9`e0e7d420 : KERNELBASE!RaiseException+0x8a

rs38 avatar Nov 11 '25 08:11 rs38

32 GB Laptop but only 4GB RTX in this case it's probably obviously not possible to squeeze the 5 GB model into it. It was crashing before but worked after reloading the Model. with CPU it can render...

Type: OnnxRuntimeException
Message: [ErrorCode:RuntimeException] Exception during initialization: C:\__w\1\s\win-onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(934)\onnxruntime.DLL!00007FF845B46A9A: (caller: 00007FF845BDAFFE) Exception(1) tid(5988) 8007000E Not enough memory resources are available to complete this operation.

StackTrace:    at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer)
   at AIDevGallery.Samples.SharedCode.StableDiffusionCode.StableDiffusion.<>c__DisplayClass10_0.<<GetInferenceSession>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at AIDevGallery.Samples.SharedCode.StableDiffusionCode.StableDiffusion.InitializeAsync(WinMlSampleOptions winMlSampleOptions)
   at AIDevGallery.Samples.OpenSourceModels.StableDiffusionImageGeneration.GenerateImage.LoadModelAsync(SampleNavigationParameters sampleParams)

Faulting application name: AIDevGallery.exe, version: 0.5.0.4, time stamp: 0x68a40000
Faulting module name: Microsoft.UI.Xaml.dll, version: 3.2.0.0, time stamp: 0x959c9563
Exception code: 0xc000027b
Fault offset: 0x00000000003a157d
Faulting process id: 0x8010
Faulting application start time: 0x1DC53A0C3E00329
Faulting application path: C:\Program Files\WindowsApps\Microsoft.AIDevGallery_0.5.0.0_x64__8wekyb3d8bbwe\AIDevGallery.exe
Faulting module path: C:\Program Files\WindowsApps\Microsoft.AIDevGallery_0.5.0.0_x64__8wekyb3d8bbwe\Microsoft.UI.Xaml.dll
Report Id: d5ed6e1f-d13b-47ce-83a7-e7aa35640fbc
Faulting package full name: Microsoft.AIDevGallery_0.5.0.0_x64__8wekyb3d8bbwe
Faulting package-relative application ID: App

rs38 avatar Nov 12 '25 10:11 rs38