LM-Infinite
LM-Infinite copied to clipboard
Implementation of paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
``` python if kv_seq_len > local_branch + global_branch and use_lambda_mask: past_key_value = ( torch.cat([ key_states[..., :global_branch, :], key_states[..., -local_branch:, :], ], dim=-2), torch.cat([ value_states[..., :global_branch, :], value_states[..., -local_branch:, :], ],...
Hi, thanks for the nice work! I tried to use the following code to enable LM-Infinite for Llama following Readme, ```python model = LlamaForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf', torch_dtype=torch.bfloat16, device_map="cuda", low_cpu_mem_usage=True) from models.llama import...
[llama.py#L144](https://github.com/Glaciohound/LM-Infinite/blob/0caf81fe351975978ba79fc6d8bc5aeaa91d0e63/models/llama.py#L144)会调用transformers的models/llama/modeling_llama.py 的if seq_len > self.max_seq_len_cached RuntimeError: Boolean value of Tensor with more than one value is ambiguous 这里参数传递有问题 transformers版本为4.32.1
此方法能和vllm结合使用吗?如果可以的话能不能提供一个示例代码?
对于非base-model的LLama,而是经过自己的peft方法微调的model,可以直接使用吗?