LM-Infinite icon indicating copy to clipboard operation
LM-Infinite copied to clipboard

Implementation of paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"

Results 5 LM-Infinite issues
Sort by recently updated
recently updated
newest added

``` python if kv_seq_len > local_branch + global_branch and use_lambda_mask: past_key_value = ( torch.cat([ key_states[..., :global_branch, :], key_states[..., -local_branch:, :], ], dim=-2), torch.cat([ value_states[..., :global_branch, :], value_states[..., -local_branch:, :], ],...

Hi, thanks for the nice work! I tried to use the following code to enable LM-Infinite for Llama following Readme, ```python model = LlamaForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf', torch_dtype=torch.bfloat16, device_map="cuda", low_cpu_mem_usage=True) from models.llama import...

[llama.py#L144](https://github.com/Glaciohound/LM-Infinite/blob/0caf81fe351975978ba79fc6d8bc5aeaa91d0e63/models/llama.py#L144)会调用transformers的models/llama/modeling_llama.py 的if seq_len > self.max_seq_len_cached RuntimeError: Boolean value of Tensor with more than one value is ambiguous 这里参数传递有问题 transformers版本为4.32.1

此方法能和vllm结合使用吗?如果可以的话能不能提供一个示例代码?

对于非base-model的LLama,而是经过自己的peft方法微调的model,可以直接使用吗?