nanoGPT PyTorch nn.LayerNorm now takes bias arg

PyTorch nn.LayerNorm now takes bias arg - removed custom class

Open calmitchell617 opened this issue 11 months ago • 1 comments

Hi, I noticed that the PyTorch nn.LayerNorm class now takes a bias arg. This PR removes the custom LayerNorm class and replaces it with the built-in.

I tested the qualitative effect of this change by checking out a fresh version of master branch, then running:

python data/shakespeare_char/prepare.py

python train.py config/train_shakespeare_char.py

I ran the same commands after making the code changes, and compared the results after 1000 iters on an RTX 6000 Ada. The eval results were:

step 1000: train loss 1.2743, val loss 1.5198
step 1000: train loss 1.2760, val loss 1.5265

Not identical, but it seems to be working well enough. A sample taken after 2000 iters:

$ python sample.py --out_dir=out-shakespeare-char
Overriding: out_dir = out-shakespeare-char
number of parameters: 10.65M
Loading meta from data/shakespeare_char/meta.pkl...


KING RICHARD III:
The last through thy beauteous graves
Beating her brother uncontrary'd.

DUKE OF YORK:
Prove, my lord, I'll not speak to thy course;
And that might send in this maid overth and the king
And and selfsame to my life, once by a crown,
And why he's not known to-day sweet to woe more.

KING RICHARD III:
Then at this is a gentleman poor great to
The woman's part son; and therefore you are the prince
Of your hands, being, you are advance.

EDWARD:
It is true.

Mar 10 '24 09:03 calmitchell617

Just adding that I also noticed that bias is there and did a bunch of measurements, including on GPT-2 owt and the loss delta is negligible.

Ran experiments on a 2x 4090 setup.

Mar 17 '24 11:03 sopotc

nanoGPT nanoGPT copied to clipboard

PyTorch nn.LayerNorm now takes bias arg - removed custom class

nanoGPT
nanoGPT copied to clipboard