ColossalAI [lazyinit] add correctness verification

[lazyinit] add correctness verification

Open ver217 opened this issue 1 year ago • 1 comments

📌 Checklist before creating the PR

[x] I have created an issue for this PR for traceability
[x] The title follows the standard format: [doc/gemini/tensor/...]: A concise description
[x] I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

Closes #3134

📝 What does this PR do?

Summarize your work here. if you have any plots/diagrams/screenshots/tables, please attach them here.

Add correctness verification on many model sets.

Known issues: some params of some models may not be lazy initialized and remain eager.

Here is a report.

Torchvision

model class	param lazy rate	buffer lazy rate	non-lazy numel
AlexNet	16/16	0/0	0.000 M
DenseNet	364/364	363/363	0.000 M
EfficientNet	213/213	147/147	0.000 M
GoogLeNet	187/187	177/177	0.000 M
Inception3	292/292	288/288	0.000 M
MobileNetV2	158/158	156/156	0.000 M
MobileNetV3	142/142	102/102	0.000 M
MNASNet	158/158	156/156	0.000 M
ResNet	62/62	60/60	0.000 M
RegNet	215/215	213/213	0.000 M
ResNet	161/161	159/159	0.000 M
ShuffleNetV2	170/170	168/168	0.000 M
SqueezeNet	52/52	0/0	0.000 M
VGG	22/22	0/0	0.000 M
ResNet	161/161	159/159	0.000 M
VisionTransformer	152/152	0/0	0.000 M
ConvNeXt	344/344	0/0	0.000 M
SwinTransformer	173/173	0/12	0.027 M
EfficientNet	452/452	330/330	0.000 M

Diffusers

model class	param lazy rate	buffer lazy rate
AutoencoderKL	92/92	0/0
VQModel	93/93	0/0
CLIPModel	398/398	2/2
CLIPTextModel	196/196	1/1
CLIPVisionModel	199/199	1/1
UNet2DModel	432/432	0/0

Timm

model class	param lazy rate	buffer lazy rate	non-lazy numel
ResNet	263/263	213/213	0.000 M
Beit	199/199	24/24	0.000 M
Cait	476/476	0/0	0.000 M
ConvMixer	262/262	195/195	0.000 M
EfficientNet	649/649	471/471	0.000 M
MlpMixer	150/150	0/0	0.000 M
VisionTransformer	152/152	0/0	0.000 M
VisionTransformerDistilled	155/155	0/0	0.000 M
Beit	199/199	24/24	0.000 M
CoaT	152/152	0/0	0.000 M
VisionTransformer	176/176	0/0	0.000 M
NormFreeNet	128/185	0/0	20.765 M
EfficientFormer	181/181	99/100	0.002 M
VovNet	93/93	69/69	0.000 M
MlpMixer	102/150	0/0	7.633 M
MlpMixer	306/306	0/0	0.000 M
MobileNetV3	138/138	102/102	0.000 M
HighResolutionNet	279/279	273/273	0.000 M
InceptionV3	284/284	282/282	0.000 M
MlpMixer	150/150	0/0	0.000 M
NormFreeNet	243/347	0/0	40.431 M
NormFreeNet	174/228	0/0	3.946 M
RegNet	293/293	198/198	0.000 M
ResNet	118/118	108/108	0.000 M
TNT	351/351	0/0	0.000 M
ResNet	161/161	159/159	0.000 M
ConViT	180/180	0/0	0.000 M
NormFreeNet	176/233	0/0	44.327 M
ConvNeXt	344/344	0/0	0.000 M
VGG	22/22	0/0	0.000 M
DPN	217/217	216/216	0.000 M
DenseNet	364/364	363/363	0.000 M
ReXNetV1	227/227	186/186	0.000 M
SwinTransformer	329/329	11/35	0.055 M

Transformers

model class	param lazy rate	buffer lazy rate	non-lazy numel
AlbertModel	24/25	2/2	3.662 M
AlbertForPreTraining	30/34	2/2	7.381 M
AlbertForMaskedLM	26/30	2/2	7.381 M
AlbertForSequenceClassification	26/27	2/2	3.662 M
AlbertForTokenClassification	24/25	2/2	3.662 M
AlbertForQuestionAnswering	24/25	2/2	3.662 M
AlbertForMultipleChoice	26/27	2/2	3.662 M
BertModel	38/39	2/2	3.726 M
BertForPreTraining	44/48	2/2	7.510 M
BertLMHeadModel	40/44	2/2	7.510 M
BertForMaskedLM	40/44	2/2	7.510 M
BertForSequenceClassification	40/41	2/2	3.726 M
BertForTokenClassification	38/39	2/2	3.726 M
BertForNextSentencePrediction	40/41	2/2	3.726 M
BertForMultipleChoice	40/41	2/2	3.726 M
GPT2Model	28/28	4/4	0.000 M
GPT2LMHeadModel	28/29	4/4	36.809 M
GPT2DoubleHeadsModel	30/31	4/4	36.809 M
GPT2ForTokenClassification	30/30	4/4	0.000 M
GPT2ForSequenceClassification	29/29	4/4	0.000 M
OPTModel	35/36	0/0	6.137 M
OPTForCausalLM	35/37	0/0	12.273 M
T5Model	47/47	0/0	0.000 M
T5ForConditionalGeneration	47/48	0/0	3.922 M
T5EncoderModel	19/19	0/0	0.000 M

Torchaudio

model class	param lazy rate	buffer lazy rate	non-lazy numel
Conformer	120/120	12/12	0.000 M
ConvTasNet	343/343	0/0	0.000 M
DeepSpeech	18/18	0/0	0.000 M
Emformer	64/64	0/0	0.000 M
Wav2Letter	24/24	0/0	0.000 M
Wav2Letter	22/22	0/0	0.000 M
WaveRNN	36/36	15/15	0.000 M
Tacotron2	60/60	24/24	0.000 M
Wav2Vec2Model	×	×	×

💥 Checklist before requesting a review

[x] I have linked my PR to an issue (instruction)
[x] My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
[x] I have performed a self-review of my code
[x] I have added thorough tests.
[x] I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

[x] 🌝 Yes, I do.
[ ] 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

Mar 16 '23 09:03 ver217

Torch's version in CI is 1.11, which is incompatible with meta tensor. I run test on local machine:

Mar 16 '23 09:03 ver217

ColossalAI ColossalAI copied to clipboard

[lazyinit] add correctness verification

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

Torchvision

Diffusers

Timm

Transformers

Torchaudio

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

ColossalAI
ColossalAI copied to clipboard