Reproducing benchmark for CARLA
I am currently testing capabilities of NKSR and for that purpose was running some tests on scores. I have downloaded the CARLA data you provided and executed the test script together with the metrics provided. The metrics on the pretrained backbone of CARLA are as follows:
completeness (68) 0.027692637713934216
accuracy (68) 0.03867699400040561
normals completeness (68) 0.965071364518667
normals accuracy (68) 0.9516956805795869
normals (68) 0.9583835225491268
completeness2 (68) 0.025281473814217505
accuracy2 (68) 0.007717492560607829
chamfer-L2 (68) 0.016499483187412674
chamfer-L1 (68) 0.03318481585716991
f-precision (68) 0.2145197705882353
f-recall (68) 0.28277397766826823
f-score (68) 0.24255425021785418
f-score-15 (68) 0.4549117034594062
f-score-20 (68) 0.6125048929714518
According to the paper, the F-score should be above 0.9.
I have also been testing the training procedure for the CARLA data - and while validation accuracies look very promising (they are also above 90%), test f-scores are again low.
I have also been switching the precision from 32 to 64, however have not achieved any large improvement.
How can I reproduce the numbers from the paper?
Hi @lippoldt thank you for reaching out. For Table 3 we use a different threshold for computing the F-Score. This is clarified in the appendix of our paper, shown as below:
To evaluate with this F-Score, would you please change metric_names=MeshEvaluator.ESSENTIAL_METRICS in the following lines:
https://github.com/nv-tlabs/NKSR/blob/0d4e369b1ee641204d6e6d2b53c692fed6273ca5/models/nksr_net.py#L301-L303
into metric_names=['f-score-outdoor'] and try again? The reason why we use a different score is because the scale of the datasets are essentially different.
Sorry for the delay in response and I am happy to assist you with further questions.