xillee
xillee
I try fp16, 0nly 7% of GPU is utilized. In the case, how to improve the performance?
补充验证集上测试结果,以下为推理过程中的, [2024/07/30 13:14:25] ppcls INFO: [Eval][Epoch 1][Iter: 0/195]CELoss: 0.52470, loss: 0.52470, top1: 0.87500, batch_cost: 0.29910s, reader_cost: 0.26422, ips: 53.49302 images/sec [2024/07/30 13:14:28] ppcls INFO: [Eval][Epoch 1][Iter: 10/195]CELoss: 0.41228, loss: 0.41228,...
日志如附件: [eval.log](https://github.com/user-attachments/files/16427795/eval.log) [train.log](https://github.com/user-attachments/files/16427830/train.log)
用resnet18geng更新后的结果 [train.log](https://github.com/user-attachments/files/16435195/train.log) [eval.log](https://github.com/user-attachments/files/16435197/eval.log)
只训练了1个epoch的,用了resnet50,麻烦再给看下原因 python tools/train.py -c D:\lixiaolin\PaddleClas-release-2.5\ppcls\configs\ImageNet\ResNet\ResNet50.yaml [train.log](https://github.com/user-attachments/files/16440085/train.log) python tools/eval.py -c D:\lixiaolin\PaddleClas-release-2.5\ppcls\configs\ImageNet\ResNet\ResNet50.yaml -o ARCHITECTURE.name="ResNet50" -o pretrained_model=D:\lixiaolin\PaddleClas-release-2.5\output\ResNet50\best.pdparams [eval.log](https://github.com/user-attachments/files/16440086/eval.log)
[result.txt](https://github.com/user-attachments/files/16440333/result.txt) 我还对训练数据集进行了推理,统计结果中,错误的262个,正确的5862个,对应的分类准确率95.72%,跟eval的结果差别很大。 总结一下:训练集给的结果是0.78557,验证集在训练过程中的评估结果是0.96635,直接进行评估结果是0.48237,直接推训练集和验证集的结果是0.9572.训练集和验证集数据8:2分割。请帮忙解释一下数据的原因
还有一个问题,这个平均值是怎么算出来了,感觉怎么都不到0.98211 [2024/07/31 20:24:32] ppcls INFO: [Train][Epoch 24/120][Iter: 0/304]lr(PiecewiseDecay): 0.00100000, top1: 1.00000, CELoss: 0.01757, loss: 0.01757, batch_cost: 0.61280s, reader_cost: 0.12493, ips: 26.10981 samples/s, eta: 5:01:10 [2024/07/31 20:24:38] ppcls INFO: [Train][Epoch 24/120][Iter:...
> 在eval的时候没有加载训练好的权重 我的使用有问题吗?为什么没有加载权重?
那我们选最佳模型的依据应该是什么?