Chandler-Bing

Results 7 issues of Chandler-Bing

作者您好,用自己的数据训练得到output文件夹下面的predict_scroe.txt内容里P、R、F1都是0,但是label_test.txt里面还是有对应完全正确的实例。请问这个怎么回事呢? ![image](https://user-images.githubusercontent.com/29994840/77621112-c3cc6d80-6f76-11ea-85f6-a6448482429b.png)

您好,对于训练数据来说: 1,2,3 \t 4,5,6 \t 7,8,9 代表一个样本的话那么1,2,3分别是种子词的对应的id, 那么4,5,6,是3篇文档的编号吗?这个编号是怎么来的,随机分配的吗? 怎么处理文档这里不太了解,是先要分词吗?就是map成id这部不太明白,如果有时间的话 解答一下吧,谢谢您!

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/baichuan-inc/baichuan-7B/issues) and [Discussions](https://github.com/baichuan-inc/baichuan-7B/discussions) that this hasn't already been reported. (+1 or comment...

question

### System Info pass ### Who can help? @ArthurZucker ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An...

**Describe the bug** when fine-tuning my model using deepspeed==0.13.5, and huggingface trainer, loss and grad_norm will be nan at step 2 ![image](https://github.com/microsoft/DeepSpeed/assets/29994840/cf2d7a6b-91df-43d6-9706-aa82c2dbf074) but 2 ways below could solve the problem...

bug
training

[compute_loss](https://github.com/SkyworkAI/Skywork/blob/main/eval/eval_loss.py#L32) 这里为什么要自己计算呢?一般模型传入label_ids,会返回loss

[这里的loss计算](https://github.com/THUDM/LongAlign/blob/main/modeling_chatglm.py#L900)是不是应该归一化一下 `loss = (loss * shift_weights).sum()` -> `loss = (loss * shift_weights).sum() / shift_weights.sum()` 把loss归一化到token粒度 前一种方式,loss的scale偏大,而且反向传播梯度也会偏大。而且极限情况下,假设每个样本只有1个token,这个batch的loss会爆炸