minGPT icon indicating copy to clipboard operation
minGPT copied to clipboard

TPU/GPU training: KeyError 'pos_emb'

Open tech509201941 opened this issue 3 years ago • 0 comments

Hi,

I am currently testing the char notebook. Everything works fine while CPU training, but if I try to execute the same code on a GPU/TPU the following error occurs:

Exception has occurred: KeyError 'pos_emb'

If I simply remove the problematic code line:

no_decay.add('pos_emb')

It kind of works also in GPU/TPU training but the loss oscillation gets stuck and practically no improvement (or opposite) is made while training like it happens while CPU training where the loss is obviously oscillating with same code base.

Can anyone explain to me how it is possible to solve this KeyError without corrupting the no_decay set? Thanks a lot! :)

tech509201941 avatar Feb 18 '21 16:02 tech509201941