AlpinDale issues

Results 75 issues of


                                            AlpinDale

[Kernel] feat: add custom CUDA kernels for all sampling ops

Trying to write custom kernels for every sampling op here.

[Kernel][Experimental] feat: add Vulkan backend

Work-in-progress, experimental Vulkan backend for Aphrodite, with custom compute shaders, inspired by ggml-vulkan. Current phase is still experimental and setting up the pipeline, runtime, and other basic functionalities. I don't...

[Kernel] feat: add Metal support for Apple Silicon GPU

Adding native support to Apple M-series GPUs through Metal shading language for the kernels. Currently, attention is implemented through Torch SDPA's MPS backend, and custom paged attention metal kernels. To...

[WIP] feat: ExLlamaV3 quantization format

Just some tests in running exl3 models. Currently kernels produce NaN so it's nowhere near ready, but weight loading works.

gguf: optimize prefill speeds for Q4_K quants

Our prefill is currently 8x slower than native llama.cpp. This is an attempt at closing that gap. Llama 3.1 8B Q4_K_M, RTX 3090 **Main** ``` Request completed - E2E time:...