Chandler-Bing issues

Results 7 issues of


                                            Chandler-Bing

结果f1为0

作者您好，用自己的数据训练得到output文件夹下面的predict_scroe.txt内容里P、R、F1都是0，但是label_test.txt里面还是有对应完全正确的实例。请问这个怎么回事呢？ ![image](https://user-images.githubusercontent.com/29994840/77621112-c3cc6d80-6f76-11ea-85f6-a6448482429b.png)

关于怎么处理文档

您好，对于训练数据来说： 1,2,3 \t 4,5,6 \t 7,8,9 代表一个样本的话那么1,2，3分别是种子词的对应的id，那么4,5,6,是3篇文档的编号吗？这个编号是怎么来的，随机分配的吗？怎么处理文档这里不太了解，是先要分词吗？就是map成id这部不太明白，如果有时间的话解答一下吧，谢谢您！

[Typo]

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/baichuan-inc/baichuan-7B/issues) and [Discussions](https://github.com/baichuan-inc/baichuan-7B/discussions) that this hasn't already been reported. (+1 or comment...

question

from_pretrained torch_dtype DO NOT affect model buffers

### System Info pass ### Who can help? @ArthurZucker ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An...

[BUG] grad_norm and loss is nan when deepspeed==0.13.5 but ok with deepspeed==0.10.2

**Describe the bug** when fine-tuning my model using deepspeed==0.13.5, and huggingface trainer, loss and grad_norm will be nan at step 2 ![image](https://github.com/microsoft/DeepSpeed/assets/29994840/cf2d7a6b-91df-43d6-9706-aa82c2dbf074) but 2 ways below could solve the problem...

bug

training

为什么不直接用model返回的loss而要自己计算呢？

[compute_loss](https://github.com/SkyworkAI/Skywork/blob/main/eval/eval_loss.py#L32) 这里为什么要自己计算呢？一般模型传入label_ids，会返回loss

packing loss 的归一化问题

[这里的loss计算](https://github.com/THUDM/LongAlign/blob/main/modeling_chatglm.py#L900)是不是应该归一化一下 `loss = (loss * shift_weights).sum()` -> `loss = (loss * shift_weights).sum() / shift_weights.sum()` 把loss归一化到token粒度前一种方式，loss的scale偏大，而且反向传播梯度也会偏大。而且极限情况下，假设每个样本只有1个token，这个batch的loss会爆炸