Results 17 comments of mans

Yes. Embedding are save in the state_dict of the checkpoints.

Thank you for your feedback. We will follow up on this issue.

> new_embed_tokens_w = torch.zeros([NEW_VOCAB_SIZE, HIDDEN_SIZE]) > new_embed_tokens_w[:NEW_VOCAB_SIZE] = embed_tokens_w Your code init the embed and head weight to zero which may lead to optimization problem. A more resonable way may...

> @mmmans 主要是因为NormHead的类型不匹配,没法直接去调`model.resize_token_embeddings() `,看了 #49 里的提示直接去改了权重文件里的数据,模型加载是能通过了,但是训练跑起来有问题。还望赐教,谢谢! Can you provide the logits that lead to zero loss ?

> Added the following lines: > > ```python > new_embed_tokens_w[OLD_VOCAB_SIZE:NEW_VOCAB_SIZE] = embed_tokens_w.mean(dim=0,keepdim=True) > new_lm_head_w[OLD_VOCAB_SIZE:NEW_VOCAB_SIZE] = lm_head_w.mean(dim=0, keepdim=True) > ``` > > and now everything's looking good after 0.1x epoch. will...

> 我把upcast to fp32 注释掉就好了,两个GPU各用了70GB内存。 有一点疑惑就是为什么这里inference需要这么多显存? HuggingFace 上的例子就用了15GB/GPU, 是因为这里处理的文本过长的原因吗? 有复现的代码吗

Seems training in eval mode. check /home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py, line 354. Maybe you should call model.train() before training