ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[lazyinit] add correctness verification

Open ver217 opened this issue 1 year ago • 1 comments

📌 Checklist before creating the PR

  • [x] I have created an issue for this PR for traceability
  • [x] The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • [x] I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

Closes #3134

📝 What does this PR do?

Summarize your work here. if you have any plots/diagrams/screenshots/tables, please attach them here.

Add correctness verification on many model sets.

Known issues: some params of some models may not be lazy initialized and remain eager.

Here is a report.

Torchvision

model class param lazy rate buffer lazy rate non-lazy numel
AlexNet 16/16 0/0 0.000 M
DenseNet 364/364 363/363 0.000 M
EfficientNet 213/213 147/147 0.000 M
GoogLeNet 187/187 177/177 0.000 M
Inception3 292/292 288/288 0.000 M
MobileNetV2 158/158 156/156 0.000 M
MobileNetV3 142/142 102/102 0.000 M
MNASNet 158/158 156/156 0.000 M
ResNet 62/62 60/60 0.000 M
RegNet 215/215 213/213 0.000 M
ResNet 161/161 159/159 0.000 M
ShuffleNetV2 170/170 168/168 0.000 M
SqueezeNet 52/52 0/0 0.000 M
VGG 22/22 0/0 0.000 M
ResNet 161/161 159/159 0.000 M
VisionTransformer 152/152 0/0 0.000 M
ConvNeXt 344/344 0/0 0.000 M
SwinTransformer 173/173 0/12 0.027 M
EfficientNet 452/452 330/330 0.000 M

Diffusers

model class param lazy rate buffer lazy rate non-lazy numel
AutoencoderKL 92/92 0/0 0.000 M
VQModel 93/93 0/0 0.000 M
CLIPModel 398/398 2/2 0.000 M
CLIPTextModel 196/196 1/1 0.000 M
CLIPVisionModel 199/199 1/1 0.000 M
UNet2DModel 432/432 0/0 0.000 M

Timm

model class param lazy rate buffer lazy rate non-lazy numel
ResNet 263/263 213/213 0.000 M
Beit 199/199 24/24 0.000 M
Cait 476/476 0/0 0.000 M
ConvMixer 262/262 195/195 0.000 M
EfficientNet 649/649 471/471 0.000 M
MlpMixer 150/150 0/0 0.000 M
VisionTransformer 152/152 0/0 0.000 M
VisionTransformerDistilled 155/155 0/0 0.000 M
Beit 199/199 24/24 0.000 M
CoaT 152/152 0/0 0.000 M
VisionTransformer 176/176 0/0 0.000 M
NormFreeNet 128/185 0/0 20.765 M
EfficientFormer 181/181 99/100 0.002 M
VovNet 93/93 69/69 0.000 M
MlpMixer 102/150 0/0 7.633 M
MlpMixer 306/306 0/0 0.000 M
MobileNetV3 138/138 102/102 0.000 M
HighResolutionNet 279/279 273/273 0.000 M
InceptionV3 284/284 282/282 0.000 M
MlpMixer 150/150 0/0 0.000 M
NormFreeNet 243/347 0/0 40.431 M
NormFreeNet 174/228 0/0 3.946 M
RegNet 293/293 198/198 0.000 M
ResNet 118/118 108/108 0.000 M
TNT 351/351 0/0 0.000 M
ResNet 161/161 159/159 0.000 M
ConViT 180/180 0/0 0.000 M
NormFreeNet 176/233 0/0 44.327 M
ConvNeXt 344/344 0/0 0.000 M
VGG 22/22 0/0 0.000 M
DPN 217/217 216/216 0.000 M
DenseNet 364/364 363/363 0.000 M
ReXNetV1 227/227 186/186 0.000 M
SwinTransformer 329/329 11/35 0.055 M

Transformers

model class param lazy rate buffer lazy rate non-lazy numel
AlbertModel 24/25 2/2 3.662 M
AlbertForPreTraining 30/34 2/2 7.381 M
AlbertForMaskedLM 26/30 2/2 7.381 M
AlbertForSequenceClassification 26/27 2/2 3.662 M
AlbertForTokenClassification 24/25 2/2 3.662 M
AlbertForQuestionAnswering 24/25 2/2 3.662 M
AlbertForMultipleChoice 26/27 2/2 3.662 M
BertModel 38/39 2/2 3.726 M
BertForPreTraining 44/48 2/2 7.510 M
BertLMHeadModel 40/44 2/2 7.510 M
BertForMaskedLM 40/44 2/2 7.510 M
BertForSequenceClassification 40/41 2/2 3.726 M
BertForTokenClassification 38/39 2/2 3.726 M
BertForNextSentencePrediction 40/41 2/2 3.726 M
BertForMultipleChoice 40/41 2/2 3.726 M
GPT2Model 28/28 4/4 0.000 M
GPT2LMHeadModel 28/29 4/4 36.809 M
GPT2DoubleHeadsModel 30/31 4/4 36.809 M
GPT2ForTokenClassification 30/30 4/4 0.000 M
GPT2ForSequenceClassification 29/29 4/4 0.000 M
OPTModel 35/36 0/0 6.137 M
OPTForCausalLM 35/37 0/0 12.273 M
T5Model 47/47 0/0 0.000 M
T5ForConditionalGeneration 47/48 0/0 3.922 M
T5EncoderModel 19/19 0/0 0.000 M

Torchaudio

model class param lazy rate buffer lazy rate non-lazy numel
Conformer 120/120 12/12 0.000 M
ConvTasNet 343/343 0/0 0.000 M
DeepSpeech 18/18 0/0 0.000 M
Emformer 64/64 0/0 0.000 M
Wav2Letter 24/24 0/0 0.000 M
Wav2Letter 22/22 0/0 0.000 M
WaveRNN 36/36 15/15 0.000 M
Tacotron2 60/60 24/24 0.000 M
Wav2Vec2Model × × ×

💥 Checklist before requesting a review

  • [x] I have linked my PR to an issue (instruction)
  • [x] My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • [x] I have performed a self-review of my code
  • [x] I have added thorough tests.
  • [x] I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • [x] 🌝 Yes, I do.
  • [ ] 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

ver217 avatar Mar 16 '23 09:03 ver217

Torch's version in CI is 1.11, which is incompatible with meta tensor. I run test on local machine:

image

ver217 avatar Mar 16 '23 09:03 ver217