TCFormer
TCFormer copied to clipboard
Gradient Vanishing or Explosion when integrating TCFormer into HRNet codebase.
Dear Authors,
Thanks for making this solid work publicly available.
I am trying to integrate this method into this popular codebase (HRNet) but encountered with gradient vanishing or explosion problems during training stage. [NaN or Inf found in input tensor.] It happens at around 49 epoch like:
Epoch: [49][600/4682] Time 0.274s (0.406s) Speed 116.8 samples/s Data 0.000s (0.012s) Loss 0.00058 (0.00060) Accuracy 0.638 (0.668) Epoch: [49][700/4682] Time 0.409s (0.402s) Speed 78.2 samples/s Data 0.000s (0.012s) Loss 0.00052 (0.00060) Accuracy 0.672 (0.667) Epoch: [49][800/4682] Time 0.522s (0.403s) Speed 61.3 samples/s Data 0.000s (0.011s) Loss nan (nan) Accuracy 0.000 (0.628) NaN or Inf found in input tensor. Epoch: [49][900/4682] Time 0.534s (0.397s) Speed 59.9 samples/s Data 0.000s (0.011s) Loss nan (nan) Accuracy 0.000 (0.558) NaN or Inf found in input tensor.
I would like to know have you encountered with similar problems or is TCFormer less stable than other methods like HRNet and SimpleBaseline?
We haven't met similar problems. It reads "NaN or Inf found in input tensor." how about check the input again?