torchscale
torchscale copied to clipboard
How to test the model
The codebase has provided the training code. But how the reproduce the eval result in the paper 'DeepNet: Scaling Transformers to 1,000 Layers'. Could you please provide the code to reproduce the results in table 6 and table 7 of the paper 'DeepNet: Scaling Transformers to 1,000 Layers'.