crabml issues

vulkan: add shaders for batched_gemv and bmm

only f32 is ok. make a quick & naive implementation to pass the generation at first.

flaneur2020

vulkan: add shaders for rms_norm, softmax, rope

flaneur2020

vulkan: add arithmetic operations

flaneur2020

Add vulkan backend

- [x] #186 - [x] #187 - [x] #188 - [ ] #216

flaneur2020

Add OpenELM support

https://huggingface.co/apple/OpenELM-270M-Instruct

flaneur2020

Rotary KV cache

mistral's context window is longer than the kv cache with the help of sliding window attention. we can make kv cache a ring buffer, so that we can keep chat...

flaneur2020

as described in https://arxiv.org/pdf/2309.16609.pdf the architectural differences between llama are: ![Screenshot 2024-03-31 at 23 06 47](https://github.com/crabml/crabml/assets/129800/abb26b63-8706-4958-bb82-643f752b54e7) references: https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/models/qwen.py

flaneur2020