CLEX
CLEX copied to clipboard
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
Could you please tell me what the parameters for training each model in train_lm.sh are? Thank you!
Hi, thanks for your excellent work on length extrapolation! I wonder if there is an open source checkpoint of CLEX-LLaMA-2-7B-4K.
您好,感谢您的工作!我把您的clex layer部分插到我的模型中,实现方式如下: ``` class Encoder(nn.Module): def __init__(self, config): '''省略''' elif config.my_info_dict.get("algorithm",False)=="clex": from .clex_layer import CLEXScalingRotaryEmbedding rope_scaling={"factor": 1,"max_factor": 64,"param_factor": 1,"time_dt": 0.01,"type": "clex","act": "tanh"} self.clex_layer = CLEXScalingRotaryEmbedding(config.attention_key_size, self.config.my_info_dict["train_len"], rope_scaling) '''省略''' def forward(...