transformer-xl
transformer-xl copied to clipboard
{BUG} Model and para_model Semantic error !
In pytorch implementation there is a mix between para_model and model. Should not you use only para_model?
For example in training function line number 422 you used "model.zero_grad()" but afterward you used in line number 436 "ret = para_model(data_i, target_i, *mems[i])".
Should not the whole program use para_model ??