Jiarui Fang(方佳瑞)
Jiarui Fang(方佳瑞)
Thanks for your feedback, I will have a look.
docker pull thufeifeibear/turbo_transformers_gpu:latest did you try this prebuilt image?
I suppose so. I remember I tested it. If you meet a problem, I can rebuild one.
@marsggbo Thanks for your feedback. Can you use the GeminiDDP instead of `ShardedModelV2 ` in your code. See the lastest ZeRO implementation as follows. https://github.com/hpcaitech/ColossalAI/blob/main/examples/language/gpt/train_gpt_demo.py#L161
@Sakura-gh hello, I reproduce your run script, it is OK. I use my own dataset.json. Can you check if your dataset is set correctly? A simple way is to test...
ZeRO is used in the context of ADAM or 2nd order optimizer. Generally, a DNN using SGD does not have memory shortage issues. We can through an error if the...
I see. We will check it later.
I recommend using an init context to solve the problem rather than changing the `colossal.nn` functionality. ZeRO init context provides an arg as `target_device` to designate the device to init...
Can you provide more information? A code snippet will be more helpful.