NeuralRecon icon indicating copy to clipboard operation
NeuralRecon copied to clipboard

About the training config.

Open SwingWillwow opened this issue 2 years ago • 3 comments

First of all, thx for your excellent work! However, I can not reproduce the reported performance. I follow the training config in the official training script and train this model with two RTX3090. The result is: AbsRel 0.075 AbsDiff 0.131 SqRel 0.037 LogRMSE 0.121 r1 0.926 r2 0.961 r3 0.974 complete 0.864 dist1 0.075 dist2 0.189 prec 0.482 recal 0.297 fscore 0.365

The prec, recall and fscore have a big gap between your released pretrained model. I think this might be due to the difference in equipment and training setting. So, may you provide the detailed training setting (e.g., batch_size, learning rate, epoch) and detailed equipment list(e.g. RTX2080 or Tesla V100)? It will have great help to my research! Looking forward to your reply!

SwingWillwow avatar Aug 21 '22 09:08 SwingWillwow

I conduct another experiment with 8 A6000 and set batch_size to 2. The result is much closer to the reported one this time (while still having a gap). The result is: AbsRel 0.067 AbsDiff 0.103 SqRel 0.038 RMSE 0.199 LogRMSE 0.113 r1 0.935 r2 0.962 r3 0.974 complete 0.901 dist1 0.061 dist2 0.135 prec 0.635 recal 0.431 fscore 0.512

So, a potential way to further improve the performance is to use a larger batch_size and use more GPUs. If I get any further improvement, I will report on this issue. Still looking forward to an official training config now.

SwingWillwow avatar Aug 23 '22 12:08 SwingWillwow

Do you try to train with batch_size=1? I see the default setting is batch_size=1.

HLinChen avatar Aug 31 '22 02:08 HLinChen

Do you try to train with batch_size=1? I see the default setting is batch_size=1.

Previous issues have noted that using 8 RTX2080ti with batch_size=1 gets worse performance. I try to set batch_size = 4 and train the model with 8*A6000. The result is closer to the reported one this time. Specifically, AbsRel 0.064 AbsDiff 0.099 SqRel 0.037 RMSE 0.195 LogRMSE 0.112 r1 0.935 r2 0.963 r3 0.976 complete 0.897 dist1 0.057 dist2 0.131 prec 0.668 recal 0.460 fscore 0.542

SwingWillwow avatar Aug 31 '22 05:08 SwingWillwow