nanoGPT
nanoGPT copied to clipboard
PyTorch nn.LayerNorm now takes bias arg - removed custom class
Hi, I noticed that the PyTorch nn.LayerNorm
class now takes a bias
arg. This PR removes the custom LayerNorm
class and replaces it with the built-in.
I tested the qualitative effect of this change by checking out a fresh version of master branch, then running:
python data/shakespeare_char/prepare.py
python train.py config/train_shakespeare_char.py
I ran the same commands after making the code changes, and compared the results after 1000 iters on an RTX 6000 Ada. The eval results were:
step 1000: train loss 1.2743, val loss 1.5198
step 1000: train loss 1.2760, val loss 1.5265
Not identical, but it seems to be working well enough. A sample taken after 2000 iters:
$ python sample.py --out_dir=out-shakespeare-char
Overriding: out_dir = out-shakespeare-char
number of parameters: 10.65M
Loading meta from data/shakespeare_char/meta.pkl...
KING RICHARD III:
The last through thy beauteous graves
Beating her brother uncontrary'd.
DUKE OF YORK:
Prove, my lord, I'll not speak to thy course;
And that might send in this maid overth and the king
And and selfsame to my life, once by a crown,
And why he's not known to-day sweet to woe more.
KING RICHARD III:
Then at this is a gentleman poor great to
The woman's part son; and therefore you are the prince
Of your hands, being, you are advance.
EDWARD:
It is true.
Just adding that I also noticed that bias is there and did a bunch of measurements, including on GPT-2 owt and the loss delta is negligible.
Ran experiments on a 2x 4090 setup.