PatchCore_anomaly_detection icon indicating copy to clipboard operation
PatchCore_anomaly_detection copied to clipboard

loss=nan

Open machine52vision opened this issue 3 years ago • 7 comments

hello,how solve loss=nan?

machine52vision avatar Aug 27 '21 02:08 machine52vision

Hi, @machine52vision , have you solved this problem?

XiaoPengZong avatar Sep 07 '21 01:09 XiaoPengZong

Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed. That said, it doesn't matter if loss is NaN.

SDJustus avatar Sep 07 '21 06:09 SDJustus

thanks a lot!

machine52vision avatar Sep 07 '21 06:09 machine52vision

Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed. That said, it doesn't matter if loss is NaN.

Hi. @SDJustus , I want to train my dataset with this code, not just inference. So I think it is matter if loss is Nan.

XiaoPengZong avatar Sep 07 '21 06:09 XiaoPengZong

OK, so if you look at this code from train.py:
for param in self.model.parameters():
param.requires_grad = False you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing).
So again, no network weight updates are done during training. So Loss NaN is totally fine here.

SDJustus avatar Sep 07 '21 07:09 SDJustus

OK, so if you look at this code from train.py: for param in self.model.parameters(): param.requires_grad = False you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing). So again, no network weight updates are done during training. So Loss NaN is totally fine here.

OK, thanks, I know it.

XiaoPengZong avatar Sep 07 '21 07:09 XiaoPengZong

Dig into the pl code pytorch_lightning\core\lightning.py, when prepare the dump info in each batch, there's such logic to assign the value to loss in function get_progress_bar_dict. if running_train_loss is not None: avg_training_loss = running_train_loss.cpu().item() elif self.automatic_optimization: avg_training_loss = float('NaN') check the definition automatic_optimization, def automatic_optimization(self) -> bool: """ If False you are responsible for calling .backward, .step, zero_grad. """ return self._automatic_optimization As there's no backward logic during trainning, automatic_optimization can be set to false to avoid set NaN to loss. I've modified the function configure_optimizers in train.py, there's no loss=NaN printed anymore. def configure_optimizers(self): self.automatic_optimization = False return None

zhangjunli177 avatar Apr 20 '22 00:04 zhangjunli177