AlpinDale

Results 75 issues of AlpinDale

Trying to write custom kernels for every sampling op here.

Work-in-progress, experimental Vulkan backend for Aphrodite, with custom compute shaders, inspired by ggml-vulkan. Current phase is still experimental and setting up the pipeline, runtime, and other basic functionalities. I don't...

Adding native support to Apple M-series GPUs through Metal shading language for the kernels. Currently, attention is implemented through Torch SDPA's MPS backend, and custom paged attention metal kernels. To...

Just some tests in running exl3 models. Currently kernels produce NaN so it's nowhere near ready, but weight loading works.

Our prefill is currently 8x slower than native llama.cpp. This is an attempt at closing that gap. Llama 3.1 8B Q4_K_M, RTX 3090 **Main** ``` Request completed - E2E time:...