transformer-xl
transformer-xl copied to clipboard
run pytorch’s run_wt103_large.sh print 285170506 parameters, but the paper is 128M, and OOM.
RuntimeError: CUDA out of memory.
My GPU is 11441MiB.
How to reproduce 128M-model?
Thank you @kimiyoung @zihangdai
Same problem here, even with a Titan V 32G x 8 system, I run into the OOM problem.