Universal-Transformer-Pytorch
Universal-Transformer-Pytorch copied to clipboard
probability exceed threshold at step 2 from second epoch onwards
hi,
when I run the model, I realize at first epoch it can reach max step 24, but start from second or third epoch, the probability by "p = self.sigma(self.p(state)).squeeze(-1)" become very near to threshold and it will exceed at step 2. So my encoder layer become only has 2 layer. Any idea why?