ColossalAI
ColossalAI copied to clipboard
[lazyinit] add correctness verification
📌 Checklist before creating the PR
- [x] I have created an issue for this PR for traceability
- [x] The title follows the standard format:
[doc/gemini/tensor/...]: A concise description
- [x] I have added relevant tags if possible for us to better distinguish different PRs
🚨 Issue number
Link this PR to your issue with words like fixed to automatically close the linked issue upon merge
e.g.
fixed #1234
,closed #1234
,resolved #1234
Closes #3134
📝 What does this PR do?
Summarize your work here. if you have any plots/diagrams/screenshots/tables, please attach them here.
Add correctness verification on many model sets.
Known issues: some params of some models may not be lazy initialized and remain eager.
Here is a report.
Torchvision
model class | param lazy rate | buffer lazy rate | non-lazy numel |
---|---|---|---|
AlexNet | 16/16 | 0/0 | 0.000 M |
DenseNet | 364/364 | 363/363 | 0.000 M |
EfficientNet | 213/213 | 147/147 | 0.000 M |
GoogLeNet | 187/187 | 177/177 | 0.000 M |
Inception3 | 292/292 | 288/288 | 0.000 M |
MobileNetV2 | 158/158 | 156/156 | 0.000 M |
MobileNetV3 | 142/142 | 102/102 | 0.000 M |
MNASNet | 158/158 | 156/156 | 0.000 M |
ResNet | 62/62 | 60/60 | 0.000 M |
RegNet | 215/215 | 213/213 | 0.000 M |
ResNet | 161/161 | 159/159 | 0.000 M |
ShuffleNetV2 | 170/170 | 168/168 | 0.000 M |
SqueezeNet | 52/52 | 0/0 | 0.000 M |
VGG | 22/22 | 0/0 | 0.000 M |
ResNet | 161/161 | 159/159 | 0.000 M |
VisionTransformer | 152/152 | 0/0 | 0.000 M |
ConvNeXt | 344/344 | 0/0 | 0.000 M |
SwinTransformer | 173/173 | 0/12 | 0.027 M |
EfficientNet | 452/452 | 330/330 | 0.000 M |
Diffusers
model class | param lazy rate | buffer lazy rate | non-lazy numel |
---|---|---|---|
AutoencoderKL | 92/92 | 0/0 | 0.000 M |
VQModel | 93/93 | 0/0 | 0.000 M |
CLIPModel | 398/398 | 2/2 | 0.000 M |
CLIPTextModel | 196/196 | 1/1 | 0.000 M |
CLIPVisionModel | 199/199 | 1/1 | 0.000 M |
UNet2DModel | 432/432 | 0/0 | 0.000 M |
Timm
model class | param lazy rate | buffer lazy rate | non-lazy numel |
---|---|---|---|
ResNet | 263/263 | 213/213 | 0.000 M |
Beit | 199/199 | 24/24 | 0.000 M |
Cait | 476/476 | 0/0 | 0.000 M |
ConvMixer | 262/262 | 195/195 | 0.000 M |
EfficientNet | 649/649 | 471/471 | 0.000 M |
MlpMixer | 150/150 | 0/0 | 0.000 M |
VisionTransformer | 152/152 | 0/0 | 0.000 M |
VisionTransformerDistilled | 155/155 | 0/0 | 0.000 M |
Beit | 199/199 | 24/24 | 0.000 M |
CoaT | 152/152 | 0/0 | 0.000 M |
VisionTransformer | 176/176 | 0/0 | 0.000 M |
NormFreeNet | 128/185 | 0/0 | 20.765 M |
EfficientFormer | 181/181 | 99/100 | 0.002 M |
VovNet | 93/93 | 69/69 | 0.000 M |
MlpMixer | 102/150 | 0/0 | 7.633 M |
MlpMixer | 306/306 | 0/0 | 0.000 M |
MobileNetV3 | 138/138 | 102/102 | 0.000 M |
HighResolutionNet | 279/279 | 273/273 | 0.000 M |
InceptionV3 | 284/284 | 282/282 | 0.000 M |
MlpMixer | 150/150 | 0/0 | 0.000 M |
NormFreeNet | 243/347 | 0/0 | 40.431 M |
NormFreeNet | 174/228 | 0/0 | 3.946 M |
RegNet | 293/293 | 198/198 | 0.000 M |
ResNet | 118/118 | 108/108 | 0.000 M |
TNT | 351/351 | 0/0 | 0.000 M |
ResNet | 161/161 | 159/159 | 0.000 M |
ConViT | 180/180 | 0/0 | 0.000 M |
NormFreeNet | 176/233 | 0/0 | 44.327 M |
ConvNeXt | 344/344 | 0/0 | 0.000 M |
VGG | 22/22 | 0/0 | 0.000 M |
DPN | 217/217 | 216/216 | 0.000 M |
DenseNet | 364/364 | 363/363 | 0.000 M |
ReXNetV1 | 227/227 | 186/186 | 0.000 M |
SwinTransformer | 329/329 | 11/35 | 0.055 M |
Transformers
model class | param lazy rate | buffer lazy rate | non-lazy numel |
---|---|---|---|
AlbertModel | 24/25 | 2/2 | 3.662 M |
AlbertForPreTraining | 30/34 | 2/2 | 7.381 M |
AlbertForMaskedLM | 26/30 | 2/2 | 7.381 M |
AlbertForSequenceClassification | 26/27 | 2/2 | 3.662 M |
AlbertForTokenClassification | 24/25 | 2/2 | 3.662 M |
AlbertForQuestionAnswering | 24/25 | 2/2 | 3.662 M |
AlbertForMultipleChoice | 26/27 | 2/2 | 3.662 M |
BertModel | 38/39 | 2/2 | 3.726 M |
BertForPreTraining | 44/48 | 2/2 | 7.510 M |
BertLMHeadModel | 40/44 | 2/2 | 7.510 M |
BertForMaskedLM | 40/44 | 2/2 | 7.510 M |
BertForSequenceClassification | 40/41 | 2/2 | 3.726 M |
BertForTokenClassification | 38/39 | 2/2 | 3.726 M |
BertForNextSentencePrediction | 40/41 | 2/2 | 3.726 M |
BertForMultipleChoice | 40/41 | 2/2 | 3.726 M |
GPT2Model | 28/28 | 4/4 | 0.000 M |
GPT2LMHeadModel | 28/29 | 4/4 | 36.809 M |
GPT2DoubleHeadsModel | 30/31 | 4/4 | 36.809 M |
GPT2ForTokenClassification | 30/30 | 4/4 | 0.000 M |
GPT2ForSequenceClassification | 29/29 | 4/4 | 0.000 M |
OPTModel | 35/36 | 0/0 | 6.137 M |
OPTForCausalLM | 35/37 | 0/0 | 12.273 M |
T5Model | 47/47 | 0/0 | 0.000 M |
T5ForConditionalGeneration | 47/48 | 0/0 | 3.922 M |
T5EncoderModel | 19/19 | 0/0 | 0.000 M |
Torchaudio
model class | param lazy rate | buffer lazy rate | non-lazy numel |
---|---|---|---|
Conformer | 120/120 | 12/12 | 0.000 M |
ConvTasNet | 343/343 | 0/0 | 0.000 M |
DeepSpeech | 18/18 | 0/0 | 0.000 M |
Emformer | 64/64 | 0/0 | 0.000 M |
Wav2Letter | 24/24 | 0/0 | 0.000 M |
Wav2Letter | 22/22 | 0/0 | 0.000 M |
WaveRNN | 36/36 | 15/15 | 0.000 M |
Tacotron2 | 60/60 | 24/24 | 0.000 M |
Wav2Vec2Model | × | × | × |
💥 Checklist before requesting a review
- [x] I have linked my PR to an issue (instruction)
- [x] My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
- [x] I have performed a self-review of my code
- [x] I have added thorough tests.
- [x] I have added docstrings for all the functions/methods I implemented
⭐️ Do you enjoy contributing to Colossal-AI?
- [x] 🌝 Yes, I do.
- [ ] 🌚 No, I don't.
Tell us more if you don't enjoy contributing to Colossal-AI.
Torch's version in CI is 1.11, which is incompatible with meta tensor. I run test on local machine: