nnUNet Big Difference in results between PyTorch 2.5.1 and 2.7.1 versions

Hey guys,

Did anybody experience a significant difference in results between using nnUNet with PyTorch 2.5.1 and PyTorch 2.7.1?

I recently bought a new machine, and RTX5090, which requires PyTorch 2.7.1+ and CUDA 12.8+, so I had to upgrade my packages. Now, whenever I try to train same exact model on same exact dataset (bare in mind I trained several times with PyTorch 2.5.1 on GCP and with PyTorch 2.7.1 on my local), I always get better results with PyTorch 2.5.1. Why is that?

Any guidance would be appreciated, thanks!

Sep 10 '25 20:09 cepa995

I haven't had time to test older torch versions, but I've had this feeling since upgrading for the same reasons. I'll try to test as well, as I have been wondering this. But if you're seeing the same thing, then I know I'm not crazy.

Oct 01 '25 14:10 vmiller987

Hey, that's quite interesting! I have not been running a lot of systematic experiments recently. Can you please share some more details? What kind of segmentation dataset is this and how big is the difference? If you are using a private dataset, cen you point me to a public dataset where you see the same effect?

Oct 23 '25 08:10 FabianIsensee

Hey @FabianIsensee! Sadly, no. I cannot provide details of the dataset. But this was basically a binary segmentation on 2D ultrasound cardiac images. Nothing special, but the performance did drop from 0.58 Dice to 0.5, which in this case I do consider significant. The only thing I can tell you is that I used PyTorch 2.51., but ever since I upgraded GPU I had to upgrade PyTorch as well, which did lead to faster training, but worse performance

Oct 27 '25 13:10 cepa995

I am running extensive experiments now comparing 2.5.1 vs 2.8.0

Oct 28 '25 06:10 FabianIsensee

Experiment will run for a while still, but if we see something then this might be the culprit: https://github.com/pytorch/pytorch/issues/163539

Oct 28 '25 11:10 FabianIsensee

Experiment will run for a while still, but if we see something then this might be the culprit: pytorch/pytorch#163539

This is certainly interesting. Is there anything I can do to assist beside running my own test between the two versions? I am trying to learn and improve my skills. Debugging models is certainly a challenge.

Oct 28 '25 11:10 vmiller987

Honestly the only thing that needs doing is to train a bunch of nnU-Nets and see what the results are. This will be done by tomorrow. I don't know if the bfloat16 error even affects us because nnU-Net uses float16 due to much broader GPU support

Oct 29 '25 11:10 FabianIsensee

Thank you guys for taking a look at this! Appreciate it! Let me know if I can help somehow

Oct 29 '25 14:10 cepa995

I am not really seeing huge differences in performance tbh. The only big difference is D10 Colon but that one is notoriously noisy

Oct 30 '25 15:10 FabianIsensee

Could this be due to the choice of a graphic card? Maybe even combination of it and new PyTorch version?

Oct 30 '25 15:10 cepa995

Honestly I don't know. My experiments were run on A100. Can you maybe confirm on your end that you also see this drop with a public dataset? I need to be able to reproduce it and private datasets are in the way of that

Oct 30 '25 17:10 FabianIsensee