DirectML support
hello, directml is a machine learning capability on top of directX. with this capability, models can be run on low end gpu devices on windows.
models can be run on low end gpu devices on windows
Both CLBlast and OpenBLAS provide this capability. What makes DirectML special?
the reason is that windows natively supports it. just windows sdk (alongside directX) will suffice then to compile it.
models can be run on low end gpu devices on windows
Both CLBlast and OpenBLAS provide this capability. What makes DirectML special?
I want to know whether the windows on arm64 is supported? how to check it?
as far as I know, yes its supported. check this out. this tells that version 1.5 of directml supports it. also, its natively available on directX, meaning no openCL requirement on windows.
The main benefit of supporting DirectML imo would be support for UWP and Xbox, which afaik don't support any of the other currently implemented backends. Definitely understand that not being a priority, but it would be concretely beneficial for those platforms.
Adding support for DirectML will be great for Windows users. It supports Intel TPUs, Nvidia, AMD and more. see DirectML-ExecutionProvider.html#directml-execution-provider In addition all is needed to compile with that is Windows SDK! and it's natively supported.
@ggerganov Do you plan adding it? can you guide contributors here?
We'll take a look, but not sure if it is viable option - see https://github.com/ggerganov/llama.cpp/issues/7772
@ggerganov
I'm pretty sure that it's viable. there's another project that uses old version of ggml combined with DirectML Const-me/Whisper.
It transcribe for me in 10s duration of 20s with medium model!
While currently whisper.cpp in 60s!
| Project | Model Version | Backend | Transcription time (20s) | Hardware |
|---|---|---|---|---|
| Const-me/Whisper | Medium | DirectML | 10s | AMD Ryzen 5 4500U |
| whisper.cpp | Medium | OpenCL | 23s | AMD Ryzen 5 4500U |
| ctranslate2-rs | Medium | Intel MKL | 39s | AMD Ryzen 5 4500U |
| ctranslate2-rs | Medium | ctranslate2 | 42s | AMD Ryzen 5 4500U |
| whisper.cpp | Medium | OpenBlas | 60s | AMD Ryzen 5 4500U |
| sherpa-rs | Medium | onnxruntime (standard) | 68s | AMD Ryzen 5 4500U |
Both CLBlast and OpenBLAS provide this capability. What makes DirectML special?
There's no support for CLBlast in whisper.cpp anymore. Moreover, where it was used, it was slower. Please see the latest comparison I added. DirectML is the best choice for optimization on Windows. It will run faster with AMD, Nvidia, CPU, and more. The current performance of whisper.cpp on Windows clearly demonstrates the need for this optimization.