keras-nlp Parameter Tying

Parameter Tying

Open arivero opened this issue 1 year ago • 4 comments

Is your feature request related to a problem? Please describe.

A lot of models, including GPT, use the same weights matrix for the embedding of the input and, transposing it, for the "unembedding" towards logits. This means the same tensor must be in two different layers, or a layer must be able to work with two different kind of inputs.

Describe the solution you'd like

I think the ideal situation is to be able to signal, when creating a layer, that some of the weights must be taken from another layer. This also allows for the slightly more general case when you just create a tensor and provide it as weight for both layers.

Describe alternatives you've considered

In principle it could be possible to do a "soft tying" of weights in two layers by using a loss measuring the difference between both layers and adding it to the regularisation losses of one or both layers, or to the model. It is unclear if it works.

Also, it could be possible to play with the initialization function of the layers to make sure it provides the same tensor to the two layers; this seems feasible but hacky, if it is not given as an official option.

Mar 20 '23 03:03 arivero

keras-nlp keras-nlp copied to clipboard

Parameter Tying

keras-nlp
keras-nlp copied to clipboard