llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

llama : support RWKV v6 models

Open MollySophia opened this issue 6 months ago • 0 comments

This should fix #846.

Added:

ggml:

  • Added unary operation Exp
  • Added rwkv_wkv operation with CPU impl
  • Added rwkv_token_shift operation with CPU impl to handle multiple sequences in parallel(may not be necessary after #8526 is done)

llama.cpp:

  • rwkv_world tokenizer support (by @LaylBongers)
  • convert_hf_to_gguf.py support for converting RWKV v6 HF models
  • RWKV v6 graph building

TODO:

  • Do modifications after #8526 is ready accordingly
  • Add CUDA or Metal implementation for rwkv_wkv operation

MollySophia avatar Aug 11 '24 02:08 MollySophia