[DOC]: Add some clarification about ZeroInitContext, ColoInitContext, and colossalai.initialize
📚 The doc issue
There is a issue raise in the slack channel about their differences. I have posted a preliminary answer to it based on my inspection of our source codes. Please polish it and make it accessible in our official documentation system. Thanks!
Great suggestion. In fact, we are trying to refactor this repository a bit in the future. We will release our design and development plan soon.
Great suggestion. In fact, we are trying to refactor this repository a bit in the future. We will release our design and development plan soon.
Hey I'm also confused about the difference between coloinitcontext and zeroinitcontext. I noticeed in the example/ dir, you use different initcontext for different language model, could you clarify the differences?
And I can not open the link https://colossalaiworkspace.slack.com/archives/C02NAJARJ9Y/p1676959657443859?thread_ts=1676900155.658179&cid=C02NAJARJ9Y
FYI,
Hi, with ColoInitContext and with ZeroInitContext are not compatible and you may use either of them. ColoInitContext initializes the torch module with coloparameters that contain distributed specifications. ZeroInitContext auto-shards the parameters onto devices according to your provided strategy.
colossalai.initialize is another orthogonal method that wraps torch model and dataloader around to enable users to use our own training loops. It is mainly for ease of use.
To summarize, you may try out ZeroInitContext when using ZeRO strategies (ddp) and ColoInitContext when making tensor parallelism. You can always use colossalai.initialize for ease of training.
Get Outlook for iOShttps://aka.ms/o0ukef
From: laozhanghahaha @.> Sent: Tuesday, March 21, 2023 3:32:39 PM To: hpcaitech/ColossalAI @.> Cc: Jia Tong Han @.>; Author @.> Subject: Re: [hpcaitech/ColossalAI] [DOC]: Add some clarification about ZeroInitContext, ColoInitContext, and colossalai.initialize (Issue #2840)
- External Email -
Great suggestion. In fact, we are trying to refactor this repository a bit in the future. We will release our design and development plan soon.
Hey I'm also confused about the difference between coloinitcontext and zeroinitcontext. I noticeed in the example/ dir, you use different initcontext for different language model, could you clarify the differences?
And I can not open the link https://colossalaiworkspace.slack.com/archives/C02NAJARJ9Y/p1676959657443859?thread_ts=1676900155.658179&cid=C02NAJARJ9Y
— Reply to this email directly, view it on GitHubhttps://github.com/hpcaitech/ColossalAI/issues/2840#issuecomment-1477385775, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOJL3IFLRCGRSL7WJURX6ETW5FKRPANCNFSM6AAAAAAVCT2IZM. You are receiving this because you authored the thread.Message ID: @.***>
@JThh Thanks for your explaination, and I still have some questions
-
ZeroInitContext auto-shards the parameters onto devices according to your provided strategyIf I use GeminiDDP (which support Zero and gemini) combined with coloinitcontext, will it shard parameters? -
I try to load HF model with ColoinitContext and it works, but when I switch to ZeroInitContext, It ends with cuda oom. It is a little bit weired to me, since I thought zeroinitcontext could shard parameters and use less GPU memory per device. Why would this happen?
Hi @laozhanghahaha Welcome to check our new LazyInit #2770 We have updated a lot.