XenonLamb

Results 9 comments of XenonLamb

Have you found out how to fix this?

> @XenonLamb Hello, I'd like to ask if you have successfully trained using zero3 or zero3_offload. I used the zero3.json provided by llava. However, I encountered some problems when loading...

p.s. I think the way my case hangs is similar to this issue https://github.com/huggingface/transformers/issues/28803 . However, after upgrading accelerate to 0.30.0, the issue is still not resolved.

@iamsile Has your case been resolved with the latest deepspeed version? I observed similar issues recently. typically with a bert model and some linear layers, under zero-3. the training process...

To provide some context, here is the result file I obtained after running the evaluation script: [results (2).json](https://github.com/dvlab-research/LLaMA-VID/files/13881970/results.2.json)

Thank you! May I ask which api_base did you use for evaluation? I found GPT's behavior seems different for gpt-3.5-turbo on my api base, which caused about 7% difference in...

> Hi, we use the bought api base. We tested several times and did not find such a huge gap. Are other packages kept the same, like transformers? yes, the...

I also observed similar results to yours... I tried to start from the MiniGPTV1's stage1 and 2 by adding the Caption/REC/REG/VQA datasets mentioned in the v2 paper (except for GRIT...

> @caoyu-noob, you can use the `zero_to_fp32.py` script to convert the zero3 checkpoints into a regular pytorch checkpoint. You can find documentation of this script and other checkpoint conversion options...