Hydra
Hydra copied to clipboard
Weights are shared across the MLP layers
See this https://github.com/linkedin/Liger-Kernel/pull/269 .
Confirmed that weights are shared for vicuna 7b
~~Also, for some reason I couldn't find, not all layers have an res_connection
linear layer~~
Finally, from the same screenshot, the prefix_embeding_layer
has an unused prefix_embeding_layer.embed_tokens.weight