xlearn icon indicating copy to clipboard operation
xlearn copied to clipboard

loss 为 -nan

Open jixianruyizhq opened this issue 3 years ago • 0 comments

朋友,你好,在使用 fm 模型中,发现训练数据集比较大小,会出现 train loss 为 -nan 的情况。数据量也不大,就30W,后面想用几百万的数据,感觉不能用呀。这是因为多个小于0的值连乘,导致结果为无穷小么。训练日志如下

参考别人的问题,我降低学习率,依然有这样的问题。但是当我把训练集降低的时候,就正常了。

./xlearn_train data-small/vec-small-train.dat -s 1 -v data-small/vec-small-test.dat -s 1 -x acc -r 0.000001

[------------] xLearn uses 24 threads for training task. [ ACTION ] Read Problem ... [------------] First check if the text file has been already converted to binary format. [------------] Binary file (data-small/vec-small-train.dat.bin) NOT found. Convert text file to binary file. [------------] First check if the text file has been already converted to binary format. [------------] Binary file (data-small/vec-small-test.dat.bin) NOT found. Convert text file to binary file. [------------] Number of Feature: 12001 [------------] Time cost for reading problem: 3.75 (sec) [ ACTION ] Initialize model ... [------------] Model size: 468.80 KB [------------] Time cost for model initial: 0.00 (sec) [ ACTION ] Start to train ... [------------] Epoch Train log_loss Test log_loss Test Accuracy Time cost (sec) [ 10% ] 1 -nan -nan 0.749360 0.31 [ 20% ] 2 -nan -nan 0.749360 0.31 [ 30% ] 3 -nan -nan 0.749360 0.31 [ 40% ] 4 -nan -nan 0.749360 0.32 [ 50% ] 5 -nan -nan 0.749360 0.32 [ 60% ] 6 -nan -nan 0.749360 0.33 [ 70% ] 7 -nan -nan 0.749360 0.31 [ 80% ] 8 -nan -nan 0.749360 0.31 [ 90% ] 9 -nan -nan 0.749360 0.30 [ 100% ] 10 -nan -nan 0.749360 0.32 [ ACTION ] Start to save model ... [------------] Model file: data-small/vec-small-train.dat.model [------------] Time cost for saving model: 0.00 (sec) [ ACTION ] Finish training [ ACTION ] Clear the xLearn environment ... [------------] Total time cost: 6.90 (sec)

jixianruyizhq avatar Oct 10 '20 11:10 jixianruyizhq