aft-pytorch
aft-pytorch copied to clipboard
I test the model in an NLP task.
I use aft_full model,6 layers. and I use it in init with this code:
self.encoder_transformer = nn.ModuleList()
for _ in range(6):
self.encoder_transformer.append(AFTFull(max_seqlen=500, dim=512,hidden_dim=256))
and in forward function, I use this code:
for _, layer in enumerate(self.encoder_transformer):`
x = layer(x) + x
Originally I used the traditional transformer, now I replaced it with this, the training loss appeared Nan,Is something wrong? and how U use the model for many layers,please help me, Thank U.
Hey, thanks. I'll get into it asap. Give me a while!