Chandler-Bing
Chandler-Bing
结果f1为0
作者您好,用自己的数据训练得到output文件夹下面的predict_scroe.txt内容里P、R、F1都是0,但是label_test.txt里面还是有对应完全正确的实例。请问这个怎么回事呢? 
关于怎么处理文档
您好,对于训练数据来说: 1,2,3 \t 4,5,6 \t 7,8,9 代表一个样本的话那么1,2,3分别是种子词的对应的id, 那么4,5,6,是3篇文档的编号吗?这个编号是怎么来的,随机分配的吗? 怎么处理文档这里不太了解,是先要分词吗?就是map成id这部不太明白,如果有时间的话 解答一下吧,谢谢您!
### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/baichuan-inc/baichuan-7B/issues) and [Discussions](https://github.com/baichuan-inc/baichuan-7B/discussions) that this hasn't already been reported. (+1 or comment...
### System Info pass ### Who can help? @ArthurZucker ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An...
**Describe the bug** when fine-tuning my model using deepspeed==0.13.5, and huggingface trainer, loss and grad_norm will be nan at step 2  but 2 ways below could solve the problem...
[compute_loss](https://github.com/SkyworkAI/Skywork/blob/main/eval/eval_loss.py#L32) 这里为什么要自己计算呢?一般模型传入label_ids,会返回loss
[这里的loss计算](https://github.com/THUDM/LongAlign/blob/main/modeling_chatglm.py#L900)是不是应该归一化一下 `loss = (loss * shift_weights).sum()` -> `loss = (loss * shift_weights).sum() / shift_weights.sum()` 把loss归一化到token粒度 前一种方式,loss的scale偏大,而且反向传播梯度也会偏大。而且极限情况下,假设每个样本只有1个token,这个batch的loss会爆炸