llama.cpp
llama.cpp copied to clipboard
llama : support RWKV v6 models
This should fix #846.
Added:
ggml:
- Added unary operation
Exp
- Added
rwkv_wkv
operation with CPU impl - Added
rwkv_token_shift
operation with CPU impl to handle multiple sequences in parallel(may not be necessary after #8526 is done)
llama.cpp:
-
rwkv_world
tokenizer support (by @LaylBongers) -
convert_hf_to_gguf.py
support for converting RWKV v6 HF models - RWKV v6 graph building
TODO:
- Do modifications after #8526 is ready accordingly
- Add CUDA or Metal implementation for
rwkv_wkv
operation
- [x] I have read the contributing guidelines
- Self-reported review complexity:
- [x] Medium