LM-Infinite issues

kv_seq_len bug?

1

``` python if kv_seq_len > local_branch + global_branch and use_lambda_mask: past_key_value = ( torch.cat([ key_states[..., :global_branch, :], key_states[..., -local_branch:, :], ], dim=-2), torch.cat([ value_states[..., :global_branch, :], value_states[..., -local_branch:, :], ],...

chenlidar

Improve GPU memory usage but slower inference speed?

Hi, thanks for the nice work! I tried to use the following code to enable LM-Infinite for Llama following Readme, ```python model = LlamaForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf', torch_dtype=torch.bfloat16, device_map="cuda", low_cpu_mem_usage=True) from models.llama import...

ys-zong

passkey代码运行不通

1

[llama.py#L144](https://github.com/Glaciohound/LM-Infinite/blob/0caf81fe351975978ba79fc6d8bc5aeaa91d0e63/models/llama.py#L144)会调用transformers的models/llama/modeling_llama.py 的if seq_len > self.max_seq_len_cached RuntimeError: Boolean value of Tensor with more than one value is ambiguous 这里参数传递有问题 transformers版本为4.32.1

xjwhy

能否与vllm结合使用？

1

此方法能和vllm结合使用吗？如果可以的话能不能提供一个示例代码？

jxlin98

是否和其他peft兼容？

1

对于非base-model的LLama，而是经过自己的peft方法微调的model，可以直接使用吗？

pursure-Hy

LM-Infinite
LM-Infinite copied to clipboard

Metadata

kv_seq_len bug?

Improve GPU memory usage but slower inference speed?

passkey代码运行不通

能否与vllm结合使用？

是否和其他peft兼容？

← Metadata

Owner

Metadata

LM-Infinite LM-Infinite copied to clipboard

Metadata

kv_seq_len bug?

Improve GPU memory usage but slower inference speed?

passkey代码运行不通

能否与vllm结合使用？

是否和其他peft兼容？

← Metadata

Owner

Metadata

LM-Infinite
LM-Infinite copied to clipboard