PatchCore_anomaly_detection
PatchCore_anomaly_detection copied to clipboard
loss=nan
hello,how solve loss=nan?
Hi, @machine52vision , have you solved this problem?
Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed. That said, it doesn't matter if loss is NaN.
thanks a lot!
Hm. Correct me if i am wrong, but the net is not trained at all (just inference on pretrained wide_resnet50 to get embedding vectors), so no gradients have to be computed. That said, it doesn't matter if loss is NaN.
Hi. @SDJustus , I want to train my dataset with this code, not just inference. So I think it is matter if loss is Nan.
OK, so if you look at this code from train.py:
for param in self.model.parameters():
param.requires_grad = False
you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing).
So again, no network weight updates are done during training. So Loss NaN is totally fine here.
OK, so if you look at this code from train.py:
for param in self.model.parameters():
param.requires_grad = False
you can see, that it is intended to not update model parameters during training. As you can read in the Paper, only the embeddings of a pretrained network are used to make further computations for a new dataset (such as minimax facility location and kNN for testing). So again, no network weight updates are done during training. So Loss NaN is totally fine here.
OK, thanks, I know it.
Dig into the pl code pytorch_lightning\core\lightning.py, when prepare the dump info in each batch, there's such logic to assign the value to loss in function get_progress_bar_dict.
if running_train_loss is not None: avg_training_loss = running_train_loss.cpu().item() elif self.automatic_optimization: avg_training_loss = float('NaN')
check the definition automatic_optimization,
def automatic_optimization(self) -> bool: """ If False you are responsible for calling .backward, .step, zero_grad. """ return self._automatic_optimization
As there's no backward logic during trainning, automatic_optimization can be set to false to avoid set NaN to loss.
I've modified the function configure_optimizers in train.py, there's no loss=NaN printed anymore.
def configure_optimizers(self): self.automatic_optimization = False return None