blog icon indicating copy to clipboard operation
blog copied to clipboard

DeepSpeed giving Assertion Error

Open harik68 opened this issue 1 year ago • 6 comments

I am facing some issues whe using Deep Speed for fine tuning StarCoder Model. I am exactly following the steps mentioned in this article Creating a Coding Assistant with StarCoder (section Fine-tuning StarCoder with DeepSpeed ZeRO-3). However I am getting the error “AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size 256 != 4 * 8 * 1”. I did some research on this on Google and found this link explaining the reason [BUG] batch_size check failed with zero 2 (deepspeed v0.9.0) · Issue #3228 · microsoft/DeepSpeed · GitHub However even if I use the version of deepspeed mentioned in this article as working (v 0.9.0) I am getting the same error. I tried different versions of deepspeed and accelerate but couldn’t fix the issue. Any one has any suggestions? Thanks in advance.

harik68 avatar Jul 19 '23 13:07 harik68

Same problem here, did you solve it at last?

jaywongs avatar Aug 02 '23 09:08 jaywongs

cc @lewtun @philschmid

pcuenca avatar Aug 02 '23 12:08 pcuenca

同样的问题,你终于解决了吗? deepspeed 0.10.0 错误信息: AssertionError: Check batch related parameters. train_batch_size is not equal to micro_batch_per_gpu * gradient_acc_step * world_size 64 != 8 * 1 * 1

xbinglzh avatar Aug 03 '23 03:08 xbinglzh

xbinglzh

Solved it by using transformer==4.29.2. But I don't think this method applies to all situations where this problem occurs.

jaywongs avatar Aug 07 '23 03:08 jaywongs

@pcuenca So, is this a transformers issue? Downgrading to transformer==4.29.2 did not help in my case.

BramVanroy avatar Aug 14 '23 20:08 BramVanroy

what was the problemetic version of transformers?

ethanyanjiali avatar Oct 30 '23 22:10 ethanyanjiali