Big Difference in results between PyTorch 2.5.1 and 2.7.1 versions
Hey guys,
Did anybody experience a significant difference in results between using nnUNet with PyTorch 2.5.1 and PyTorch 2.7.1?
I recently bought a new machine, and RTX5090, which requires PyTorch 2.7.1+ and CUDA 12.8+, so I had to upgrade my packages. Now, whenever I try to train same exact model on same exact dataset (bare in mind I trained several times with PyTorch 2.5.1 on GCP and with PyTorch 2.7.1 on my local), I always get better results with PyTorch 2.5.1. Why is that?
Any guidance would be appreciated, thanks!
I haven't had time to test older torch versions, but I've had this feeling since upgrading for the same reasons. I'll try to test as well, as I have been wondering this. But if you're seeing the same thing, then I know I'm not crazy.
Hey, that's quite interesting! I have not been running a lot of systematic experiments recently. Can you please share some more details? What kind of segmentation dataset is this and how big is the difference? If you are using a private dataset, cen you point me to a public dataset where you see the same effect?
Hey @FabianIsensee! Sadly, no. I cannot provide details of the dataset. But this was basically a binary segmentation on 2D ultrasound cardiac images. Nothing special, but the performance did drop from 0.58 Dice to 0.5, which in this case I do consider significant. The only thing I can tell you is that I used PyTorch 2.51., but ever since I upgraded GPU I had to upgrade PyTorch as well, which did lead to faster training, but worse performance
I am running extensive experiments now comparing 2.5.1 vs 2.8.0
Experiment will run for a while still, but if we see something then this might be the culprit: https://github.com/pytorch/pytorch/issues/163539
Experiment will run for a while still, but if we see something then this might be the culprit: pytorch/pytorch#163539
This is certainly interesting. Is there anything I can do to assist beside running my own test between the two versions? I am trying to learn and improve my skills. Debugging models is certainly a challenge.
Honestly the only thing that needs doing is to train a bunch of nnU-Nets and see what the results are. This will be done by tomorrow. I don't know if the bfloat16 error even affects us because nnU-Net uses float16 due to much broader GPU support
Thank you guys for taking a look at this! Appreciate it! Let me know if I can help somehow
I am not really seeing huge differences in performance tbh. The only big difference is D10 Colon but that one is notoriously noisy
Could this be due to the choice of a graphic card? Maybe even combination of it and new PyTorch version?
Honestly I don't know. My experiments were run on A100. Can you maybe confirm on your end that you also see this drop with a public dataset? I need to be able to reproduce it and private datasets are in the way of that