ggml DirectML support

hello, directml is a machine learning capability on top of directX. with this capability, models can be run on low end gpu devices on windows.

Jul 22 '23 08:07 brightening-eyes

models can be run on low end gpu devices on windows

Both CLBlast and OpenBLAS provide this capability. What makes DirectML special?

Jul 24 '23 00:07 LoganDark

the reason is that windows natively supports it. just windows sdk (alongside directX) will suffice then to compile it.

Jul 24 '23 06:07 brightening-eyes

models can be run on low end gpu devices on windows

Both CLBlast and OpenBLAS provide this capability. What makes DirectML special?

I want to know whether the windows on arm64 is supported? how to check it?

Jul 29 '23 04:07 shijunz

as far as I know, yes its supported. check this out. this tells that version 1.5 of directml supports it. also, its natively available on directX, meaning no openCL requirement on windows.

Jul 29 '23 07:07 brightening-eyes

The main benefit of supporting DirectML imo would be support for UWP and Xbox, which afaik don't support any of the other currently implemented backends. Definitely understand that not being a priority, but it would be concretely beneficial for those platforms.

May 01 '24 17:05 shadowndacorner

Adding support for DirectML will be great for Windows users. It supports Intel TPUs, Nvidia, AMD and more. see DirectML-ExecutionProvider.html#directml-execution-provider In addition all is needed to compile with that is Windows SDK! and it's natively supported.

@ggerganov Do you plan adding it? can you guide contributors here?

Jul 15 '24 11:07 thewh1teagle

We'll take a look, but not sure if it is viable option - see https://github.com/ggerganov/llama.cpp/issues/7772

Jul 16 '24 07:07 ggerganov

@ggerganov

I'm pretty sure that it's viable. there's another project that uses old version of ggml combined with DirectML Const-me/Whisper. It transcribe for me in 10s duration of 20s with medium model! While currently whisper.cpp in 60s!

Project	Model Version	Backend	Transcription time (20s)	Hardware
Const-me/Whisper	Medium	DirectML	10s	AMD Ryzen 5 4500U
whisper.cpp	Medium	OpenCL	23s	AMD Ryzen 5 4500U
ctranslate2-rs	Medium	Intel MKL	39s	AMD Ryzen 5 4500U
ctranslate2-rs	Medium	ctranslate2	42s	AMD Ryzen 5 4500U
whisper.cpp	Medium	OpenBlas	60s	AMD Ryzen 5 4500U
sherpa-rs	Medium	onnxruntime (standard)	68s	AMD Ryzen 5 4500U

Jul 21 '24 16:07 thewh1teagle

Both CLBlast and OpenBLAS provide this capability. What makes DirectML special?

There's no support for CLBlast in whisper.cpp anymore. Moreover, where it was used, it was slower. Please see the latest comparison I added. DirectML is the best choice for optimization on Windows. It will run faster with AMD, Nvidia, CPU, and more. The current performance of whisper.cpp on Windows clearly demonstrates the need for this optimization.

Jul 21 '24 17:07 thewh1teagle