minGPT icon indicating copy to clipboard operation
minGPT copied to clipboard

What is the purpose of `c_proj` here?

Open brynhayder opened this issue 2 months ago • 1 comments

https://github.com/karpathy/minGPT/blob/37baab71b9abea1b76ab957409a1cc2fbfba8a26/mingpt/model.py#L42

Why do we need an additional linear transformation after the MHA and before the MLP when the dimensions are the same?

(I understand that this is how the initial transformer implementation was written, but I took this operation to be for dimension consistency between sequential attention operations. It seems superfluous here since the first linear in the MLP can already take linear combinations of the attention outputs.)

brynhayder avatar Apr 10 '24 09:04 brynhayder