nnUNet icon indicating copy to clipboard operation
nnUNet copied to clipboard

Issue when I try to use nnUNet to other model

Open chenzhang9476 opened this issue 1 year ago • 6 comments

Hi, all.

I know this isn't your obligation, but just wanna post and see if any of you tried to do similar thing like me before. I'm trying to use nnUNet framework with Swin-Unet, which is transformer-based network. This is what i encountered. image As you can see, all the loss become a and pseudo dice is nan, this seems cannot be modified, I tried several times. I simple put Swin-Unet under build_network_architecture function (but only used when training, converting data is still unet framework, otherwise cannot success.)

Thank for any advice.

chenzhang9476 avatar May 17 '24 08:05 chenzhang9476

Hey, let me tag @TaWald and @saikat-roy here since they have the most experience with this kind of stuff. My 2 cents:

  • nnU-Net default initial LR is way too high for those architectures
  • you may want to use AdamW instead of SGD here

Best, Fabian

FabianIsensee avatar May 28 '24 09:05 FabianIsensee

Hey @chenzhang9476. Just following up on @FabianIsensee here. In our experience, when we trained SwinUNet using nnUNet as the training framework, we had to reduce the learning rate to 1e-4. We did use AdamW as the optimizer instead of SGD. But my guess is that, you would probably need to reduce the learning rate on SGD as well.

saikat-roy avatar May 28 '24 09:05 saikat-roy

Thank you.

But I’m confussing about the deep supervision.

On Tue, 28 May 2024 at 7:15 PM, Saikat Roy @.***> wrote:

Hey @chenzhang9476 https://github.com/chenzhang9476. Just following up on @FabianIsensee https://github.com/FabianIsensee here. In our experience, when we trained SwinUNet using nnUNet as the training framework, we had to reduce the learning rate to 1e-4. We did use AdamW as the optimizer instead of SGD. But my guess is that, you would probably need to reduce the learning rate on SGD as well.

— Reply to this email directly, view it on GitHub https://github.com/MIC-DKFZ/nnUNet/issues/2197#issuecomment-2134728362, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWAKKBLQZKYKEPZYRNGNOKDZERDMHAVCNFSM6AAAAABH3WHPXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZUG4ZDQMZWGI . You are receiving this because you were mentioned.Message ID: @.***>

chenzhang9476 avatar May 28 '24 09:05 chenzhang9476

Hey @chenzhang9476 . Can you clarify what you mean by confused? Are you trying to switch off deep supervision or are you trying to use it but are unsuccessful?

saikat-roy avatar Jun 04 '24 08:06 saikat-roy

Is deep supervision compliable with the other new framework like Swin-Unet?

chenzhang9476 avatar Jun 06 '24 02:06 chenzhang9476

Hey @chenzhang9476. It is compatible in principle as long as you configure the underlying architecture/ model to provide deep supervision like outputs to the underlying trainer.

Are you trying to do this for SwinUnet? Can you tell us where you are stuck?

saikat-roy avatar Jun 17 '24 17:06 saikat-roy

Hi, I tried training with another model using MONAI (e.g., SwinUNETR). Initially, the training was fine, but suddenly, the loss became NaN in the middle of the process. I already set the learning rate to 1e-4 and also tried 1e-5, using AdamW as well. Before updating nnUNet, I never encountered this error. Is there any way to fix this?

bhaswara avatar Oct 03 '24 19:10 bhaswara