Seq2Set icon indicating copy to clipboard operation
Seq2Set copied to clipboard

训练时loss为负,F1值始终为1

Open lightcome opened this issue 5 years ago • 10 comments

你好,我在使用此代码时刚开始训练时就出现了loss为负,F1值始终为1的情况

lightcome avatar Nov 26 '19 04:11 lightcome

image

lightcome avatar Nov 26 '19 04:11 lightcome

Thanks for your interest in our work. I will check it later. There may be some problems when calculating the evaluation metrics. The negative loss is normal because we use the self-critical algorithm. For stable training, the reward used is (F1 score of sampled label sequence-F1 score of the greedy search label sequence), which may be negative. You can refer to https://arxiv.org/abs/1612.00563 for more details.

ypengc7512 avatar Dec 28 '19 14:12 ypengc7512

我在RCV1-V2上验证也出现同样的问题,F1一直为1,loss也一直波动很大,用的默认参数; time: 638.937, epoch: 1, updates: 500, train loss: -28.491 hamming_loss: 0.00000000 | macro_f1: 0.0097 | micro_f1: 1.0000 time: 1577690354.253, epoch: 1, updates: 1000, train loss: -44.009 evaluating after 1000 updates... hamming_loss: 0.00000000 | macro_f1: 0.0097 | micro_f1: 1.0000 time: 1577691035.145, epoch: 1, updates: 1500, train loss: -46.178 evaluating after 1500 updates... hamming_loss: 0.00000000 | macro_f1: 0.0097 | micro_f1: 1.0000 time: 1577691704.548, epoch: 1, updates: 2000, train loss: -32.371 evaluating after 2000 updates... hamming_loss: 0.00000000 | macro_f1: 0.0097 | micro_f1: 1.0000

chenkejin avatar Dec 30 '19 07:12 chenkejin

出现这个问题可能是标签在预处理的时候默认转为小写但是在utils 中是将label 转为大写,所以会出现这问题,相关代码: def make_label(l, label_dict): length = len(label_dict) result = np.zeros(length) indices = [label_dict.get(label.strip().upper(), 0) for label in l] result[indices] = 1 return result 把upper修改为lower 修复了我的问题,不再出现F1为1的情况

chenkejin avatar Dec 30 '19 08:12 chenkejin

Be sure to use the MLE method to pre-train the model and use -restore to load the pre-trained checkpoint. I will update the code in the next few days.

ypengc7512 avatar Jan 06 '20 13:01 ypengc7512

Be sure to use the MLE method to pre-train the model and use -restore to load the pre-trained checkpoint. I will update the code in the next few days.

MLE method 指的是先用cross_entropy_loss在高学习率上训练到收敛,然后在rl上用低学习率训练吗? 我在aapd上用这种方法确实略高于seq2seq,但不是很明显。 aapd的第二阶段的训练过程: acc: 0.4244 |hamming_loss: 0.02515478 | micro_prec: 0.7513| micro_recall: 0.6489| micro_f1: 0.6964 time: 1578316428.174, epoch: 1, updates: 3600, train loss: -1.142 evaluating after 3600 updates... acc: 0.4304 |hamming_loss: 0.02489527 | micro_prec: 0.7552| micro_recall: 0.6510| micro_f1: 0.6992 Decaying learning rate to 9.99013e-06 time: 1578316718.249, epoch: 2, updates: 3900, train loss: -3.031 evaluating after 3900 updates... acc: 0.4324 |hamming_loss: 0.02502503 | micro_prec: 0.7539| micro_recall: 0.6489| micro_f1: 0.6974 time: 1578317006.099, epoch: 2, updates: 4200, train loss: -1.475 evaluating after 4200 updates... acc: 0.4324 |hamming_loss: 0.02495088 | micro_prec: 0.7548| micro_recall: 0.6497| micro_f1: 0.6983 time: 1578317291.478, epoch: 2, updates: 4500, train loss: -1.262 evaluating after 4500 updates... acc: 0.4384 |hamming_loss: 0.02459867 | micro_prec: 0.7591| micro_recall: 0.6543| micro_f1: 0.7028 Decaying learning rate to 9.96057e-06 time: 1578317575.567, epoch: 3, updates: 4800, train loss: -0.783 evaluating after 4800 updates... acc: 0.4324 |hamming_loss: 0.02500649 | micro_prec: 0.7530| micro_recall: 0.6510| micro_f1: 0.6983

chenkejin avatar Jan 06 '20 13:01 chenkejin

Be sure to use the MLE method to pre-train the model and use -restore to load the pre-trained checkpoint. I will update the code in the next few days.

您好,请问先使用MLE method预训练是如何操作呢?我查看了代码后也没有发现相关的设置。还是说使用您们之前提出的SGM模型作为pre-trained的模型?但是这个我也试过了,在load model的时候会报错 “RuntimeError: Error(s) in loading state_dict for seq2seq: Unexpected key(s) in state_dict: "decoder.gated1.weight", "decoder.gated1.bias", "decoder.gated2.weight", "decoder.gated2.bias".” 期待得到您的回复。

JaeZheng avatar Jun 16 '20 15:06 JaeZheng

Be sure to use the MLE method to pre-train the model and use -restore to load the pre-trained checkpoint. I will update the code in the next few days.

您好,请问先使用MLE method预训练是如何操作呢?我查看了代码后也没有发现相关的设置。还是说使用您们之前提出的SGM模型作为pre-trained的模型?但是这个我也试过了,在load model的时候会报错 “RuntimeError: Error(s) in loading state_dict for seq2seq: Unexpected key(s) in state_dict: "decoder.gated1.weight", "decoder.gated1.bias", "decoder.gated2.weight", "decoder.gated2.bias".” 期待得到您的回复。

应该是要自己把损失由计算奖励改为交叉熵损失

MilkWYX avatar Jul 14 '20 09:07 MilkWYX

Hi, i run across the same problems. How would you do the MLE pretraining please? And is there any preprocessing on the AAPD dataset to make it work ?

YoannT avatar Aug 07 '20 09:08 YoannT

Can you give a template for training the model in the readme? thank you!!

NennyYang avatar Nov 07 '23 13:11 NennyYang