mans comments

Results 17 comments of


                                            mans

请问目前模型经过微调后，可以把每个token的embedding向量dump下来吗？

Yes. Embedding are save in the state_dict of the checkpoints.

device_map为"auto"和"cuda:0"时，model.generate_config加载结果不一致

Thank you for your feedback. We will follow up on this issue.

扩充词表后训练loss为0

> new_embed_tokens_w = torch.zeros([NEW_VOCAB_SIZE, HIDDEN_SIZE]) > new_embed_tokens_w[:NEW_VOCAB_SIZE] = embed_tokens_w Your code init the embed and head weight to zero which may lead to optimization problem. A more resonable way may...

扩充词表后训练loss为0

> @mmmans 主要是因为NormHead的类型不匹配，没法直接去调`model.resize_token_embeddings() `，看了 #49 里的提示直接去改了权重文件里的数据，模型加载是能通过了，但是训练跑起来有问题。还望赐教，谢谢！ Can you provide the logits that lead to zero loss ?

扩充词表后训练loss为0

> Added the following lines: > > ```python > new_embed_tokens_w[OLD_VOCAB_SIZE:NEW_VOCAB_SIZE] = embed_tokens_w.mean(dim=0,keepdim=True) > new_lm_head_w[OLD_VOCAB_SIZE:NEW_VOCAB_SIZE] = lm_head_w.mean(dim=0, keepdim=True) > ``` > > and now everything's looking good after 0.1x epoch. will...

运行 test.py 显存爆了

> 我把upcast to fp32 注释掉就好了，两个GPU各用了70GB内存。有一点疑惑就是为什么这里inference需要这么多显存？ HuggingFace 上的例子就用了15GB/GPU，是因为这里处理的文本过长的原因吗？有复现的代码吗

微调baichuan2时提示no attribute named "future_mask"

Seems training in eval mode. check /home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py, line 354. Maybe you should call model.train() before training