ggml icon indicating copy to clipboard operation
ggml copied to clipboard

DirectML support

Open brightening-eyes opened this issue 2 years ago • 10 comments

hello, directml is a machine learning capability on top of directX. with this capability, models can be run on low end gpu devices on windows.

brightening-eyes avatar Jul 22 '23 08:07 brightening-eyes

models can be run on low end gpu devices on windows

Both CLBlast and OpenBLAS provide this capability. What makes DirectML special?

LoganDark avatar Jul 24 '23 00:07 LoganDark

the reason is that windows natively supports it. just windows sdk (alongside directX) will suffice then to compile it.

brightening-eyes avatar Jul 24 '23 06:07 brightening-eyes

models can be run on low end gpu devices on windows

Both CLBlast and OpenBLAS provide this capability. What makes DirectML special?

I want to know whether the windows on arm64 is supported? how to check it?

shijunz avatar Jul 29 '23 04:07 shijunz

as far as I know, yes its supported. check this out. this tells that version 1.5 of directml supports it. also, its natively available on directX, meaning no openCL requirement on windows.

brightening-eyes avatar Jul 29 '23 07:07 brightening-eyes

The main benefit of supporting DirectML imo would be support for UWP and Xbox, which afaik don't support any of the other currently implemented backends. Definitely understand that not being a priority, but it would be concretely beneficial for those platforms.

shadowndacorner avatar May 01 '24 17:05 shadowndacorner

Adding support for DirectML will be great for Windows users. It supports Intel TPUs, Nvidia, AMD and more. see DirectML-ExecutionProvider.html#directml-execution-provider In addition all is needed to compile with that is Windows SDK! and it's natively supported.

@ggerganov Do you plan adding it? can you guide contributors here?

thewh1teagle avatar Jul 15 '24 11:07 thewh1teagle

We'll take a look, but not sure if it is viable option - see https://github.com/ggerganov/llama.cpp/issues/7772

ggerganov avatar Jul 16 '24 07:07 ggerganov

@ggerganov

I'm pretty sure that it's viable. there's another project that uses old version of ggml combined with DirectML Const-me/Whisper. It transcribe for me in 10s duration of 20s with medium model! While currently whisper.cpp in 60s!

Project Model Version Backend Transcription time (20s) Hardware
Const-me/Whisper Medium DirectML 10s AMD Ryzen 5 4500U
whisper.cpp Medium OpenCL 23s AMD Ryzen 5 4500U
ctranslate2-rs Medium Intel MKL 39s AMD Ryzen 5 4500U
ctranslate2-rs Medium ctranslate2 42s AMD Ryzen 5 4500U
whisper.cpp Medium OpenBlas 60s AMD Ryzen 5 4500U
sherpa-rs Medium onnxruntime (standard) 68s AMD Ryzen 5 4500U

thewh1teagle avatar Jul 21 '24 16:07 thewh1teagle

Both CLBlast and OpenBLAS provide this capability. What makes DirectML special?

There's no support for CLBlast in whisper.cpp anymore. Moreover, where it was used, it was slower. Please see the latest comparison I added. DirectML is the best choice for optimization on Windows. It will run faster with AMD, Nvidia, CPU, and more. The current performance of whisper.cpp on Windows clearly demonstrates the need for this optimization.

thewh1teagle avatar Jul 21 '24 17:07 thewh1teagle