CogVideo
CogVideo copied to clipboard
What init strategy used when extending 2B model to 5B?
Feature request / 功能建议
Would love to know the team's experience extending 2B model to 5B. Including init methods, training stages etc.
These are two different models, both of which are trained from scratch. The model structures are somewhat dissimilar, especially in the embedding part. The other differences are mainly in the number of layers in the model and some different hyperparameters, everything else is exactly the same