CogVideo What init strategy used when extending 2B model to 5B?

What init strategy used when extending 2B model to 5B?

Open spacegoing opened this issue 1 year ago • 1 comments

Feature request / 功能建议

Would love to know the team's experience extending 2B model to 5B. Including init methods, training stages etc.

Aug 30 '24 09:08 spacegoing

These are two different models, both of which are trained from scratch. The model structures are somewhat dissimilar, especially in the embedding part. The other differences are mainly in the number of layers in the model and some different hyperparameters, everything else is exactly the same

Aug 30 '24 11:08 zRzRzRzRzRzRzR

CogVideo CogVideo copied to clipboard

What init strategy used when extending 2B model to 5B?

Feature request / 功能建议

CogVideo
CogVideo copied to clipboard