litgpt
litgpt copied to clipboard
qwen2.5 long context
Qwen2.5 7B Instruct and Qwen2.5 14B Instruct extended to 1 million context length
https://qwenlm.github.io/blog/qwen2.5-1m/
Looks great, thank you @ysjprojects . we limit the default kv-cache size, though?
PR should be ready to merge, the failing test case is unrelated to the model.
Thank you @ysjprojects