nnUNet
nnUNet copied to clipboard
Why is my Dice score on the test set so much higher than the validation set
First of all, thank you for your outstanding contributions.
During my training on my dataset, I encountered a problem where a total of 108 cases had 3D inputs with labels. I left 21 cases as the test set, and the remaining 87 cases were put into nnunetv2 for 5_fold cross validation.
Strangely, the dice of each validation set is only 0.3~0.4, while the dice of the test set reaches an astonishing 0.79.
I want to know if this situation is common. What may be the reason for this phenomenon.
Thank you in advance for your answer!
one of validation set summary.json
test set summary.json
Hi @ZYH23333,
sorry for the late reply to this issue. Did you by any chance take a closer look at the samples / labels to see if there are major qualitative differences between the validation cases in the given folds vs. the test cases? Maybe by chance the test cases are comparatively "easy" cases? On typical datasets, you should normally not see such stark differences in dice scores. Did you make any changes to the trainer or did you use the default?
Closing this issue for now, as it has been stale for a while. You are welcome to re-open if the problem persists!