CLEX issues

Parameters for training

1

Could you please tell me what the parameters for training each model in train_lm.sh are? Thank you!

Open Source for CLEX-LLaMA-2-7B-4K Model?

Hi, thanks for your excellent work on length extrapolation! I wonder if there is an open source checkpoint of CLEX-LLaMA-2-7B-4K.

rexdu2003

您好，感谢您的工作！我把您的clex layer部分插到我的模型中，实现方式如下： ``` class Encoder(nn.Module): def __init__(self, config): '''省略''' elif config.my_info_dict.get("algorithm",False)=="clex": from .clex_layer import CLEXScalingRotaryEmbedding rope_scaling={"factor": 1,"max_factor": 64,"param_factor": 1,"time_dt": 0.01,"type": "clex","act": "tanh"} self.clex_layer = CLEXScalingRotaryEmbedding(config.attention_key_size, self.config.my_info_dict["train_len"], rope_scaling) '''省略''' def forward(...

HT-NEKO

你好

wj882018

CLEX
CLEX copied to clipboard

Metadata

Parameters for training

Open Source for CLEX-LLaMA-2-7B-4K Model?

训练过慢，且梯度为NaN

你好

← Metadata

Owner

Metadata

CLEX CLEX copied to clipboard

Metadata

Parameters for training

Open Source for CLEX-LLaMA-2-7B-4K Model?

训练过慢，且梯度为NaN

你好

← Metadata

Owner

Metadata

CLEX
CLEX copied to clipboard