XenonLamb comments

Results 9 comments of


                                            XenonLamb

Zero-3 offload support

> @XenonLamb Hello, I'd like to ask if you have successfully trained using zero3 or zero3_offload. I used the zero3.json provided by llava. However, I encountered some problems when loading...

[BUG] BertLMHeadModel.from_pretrained hangs when using zero-3 / zero3-offload

p.s. I think the way my case hangs is similar to this issue https://github.com/huggingface/transformers/issues/28803 . However, after upgrading accelerate to 0.30.0, the issue is still not resolved.

[BUG] RuntimeError: still have inflight params [<bound method Init._convert_to_deepspeed_param.<locals>.ds_summary of Parameter containing:

@iamsile Has your case been resolved with the latest deepspeed version? I observed similar issues recently. typically with a bert model and some linear layers, under zero-3. the training process...

Incomplete evaluation on MSVD-QA dataset.

To provide some context, here is the result file I obtained after running the evaluation script: [results (2).json](https://github.com/dvlab-research/LLaMA-VID/files/13881970/results.2.json)

Incomplete evaluation on MSVD-QA dataset.

Thank you! May I ask which api_base did you use for evaluation? I found GPT's behavior seems different for gpt-3.5-turbo on my api base, which caused about 7% difference in...

Incomplete evaluation on MSVD-QA dataset.

> Hi, we use the bought api base. We tested several times and did not find such a huge gap. Are other packages kept the same, like transformers? yes, the...

Any details about the stage1 pretrain?

I also observed similar results to yours... I tried to start from the MiniGPTV1's stage1 and 2 by adding the Caption/REC/REG/VQA datasets mentioned in the v2 paper (except for GRIT...

Cannot load the previous model weights when using ZeRO 3 optimizer in DeepSpeed Chat

> @caoyu-noob, you can use the `zero_to_fp32.py` script to convert the zero3 checkpoints into a regular pytorch checkpoint. You can find documentation of this script and other checkpoint conversion options...