PaddleRec icon indicating copy to clipboard operation
PaddleRec copied to clipboard

DCN 全量模型epoch_num参数的配置是否存在问题?

Open USTCKAY opened this issue 2 years ago • 2 comments

DCN的config_bigdata.yaml中将epochs配置为10,训练过程中auc指标超过给出的benchmrak 0.777很多,推理过程中auc却达不到训练的程度,后几个epoch生成的模型甚至达不到benchmark,怀疑出现了过拟合现象。所以想请教一下这个参数的配置是否有问题? 另外看到有其他使用criteo数据集的模型的epoch参数设置为1,DCN是否也是配置为1比较好? 这是训练精度: 2022-11-24 23:10:42,973 - INFO - epoch: 0 done, auc: 0.797744, epoch time: 10259.33 s 2022-11-25 01:56:55,257 - INFO - epoch: 1 done, auc: 0.813983, epoch time: 9972.15 s 2022-11-25 04:42:34,678 - INFO - epoch: 2 done, auc: 0.824308, epoch time: 9939.29 s 2022-11-25 07:27:25,202 - INFO - epoch: 3 done, auc: 0.832276, epoch time: 9890.38 s 2022-11-25 10:09:49,307 - INFO - epoch: 4 done, auc: 0.838268, epoch time: 9743.98 s 2022-11-25 12:52:43,571 - INFO - epoch: 5 done, auc: 0.842872, epoch time: 9774.13 s 2022-11-25 15:36:14,327 - INFO - epoch: 6 done, auc: 0.846537, epoch time: 9810.63 s 2022-11-25 18:24:26,853 - INFO - epoch: 7 done, auc: 0.849568, epoch time: 10092.39 s 2022-11-25 21:05:38,557 - INFO - epoch: 8 done, auc: 0.852145, epoch time: 9671.57 s 2022-11-25 23:49:26,100 - INFO - epoch: 9 done, auc: 0.854425, epoch time: 9827.41 s 这是推理精度: 2022-11-28 10:05:40,387 - INFO - epoch: 0 done, auc: 0.800802, epoch time: 407.88 s 2022-11-28 10:12:20,827 - INFO - epoch: 1 done, auc: 0.801374, epoch time: 400.44 s 2022-11-28 10:19:00,820 - INFO - epoch: 2 done, auc: 0.796485, epoch time: 399.99 s 2022-11-28 10:25:47,121 - INFO - epoch: 3 done, auc: 0.790854, epoch time: 406.30 s 2022-11-28 10:32:18,459 - INFO - epoch: 4 done, auc: 0.786025, epoch time: 391.34 s 2022-11-28 10:38:54,057 - INFO - epoch: 5 done, auc: 0.782249, epoch time: 395.60 s 2022-11-28 10:45:33,497 - INFO - epoch: 6 done, auc: 0.778672, epoch time: 399.44 s 2022-11-28 10:52:11,217 - INFO - epoch: 7 done, auc: 0.775637, epoch time: 397.72 s 2022-11-28 10:58:49,925 - INFO - epoch: 8 done, auc: 0.773437, epoch time: 398.71 s 2022-11-28 11:05:31,881 - INFO - epoch: 9 done, auc: 0.770960, epoch time: 401.96 s

USTCKAY avatar Nov 28 '22 02:11 USTCKAY

看起来确实是过拟合了,多谢建议

wangzhen38 avatar Nov 29 '22 08:11 wangzhen38

看起来确实是过拟合了,多谢建议

同样的问题在其他模型似乎也存在,比如DIN。有时间、人力的话可以排查一下。

USTCKAY avatar Nov 30 '22 06:11 USTCKAY