Lin Chenjian comments

Results 15 comments of


                                            Lin Chenjian

the gradient of all parameters is None

I also need to extract p.grad for subsequent calculations. Is there any way to get p.grad correctly? I have read the above code but still don't know how to do...

为什么我训练的时候，每个epoch非常快呐？就像没有没有正确加载数据一样？

> @xbyym 可以在这一行[https://github.com/hpcaitech/Open-Sora/blob/main/scripts/train.py#L264下面插入`print(batch)`](https://github.com/hpcaitech/Open-Sora/blob/main/scripts/train.py#L264%E4%B8%8B%E9%9D%A2%E6%8F%92%E5%85%A5%60print(batch)%60) 看看我也遇到了相同的问题，在大多数轮次时我print(batch)不包含任何数据，极个别epoch可以正常进行训练，这是为什么？

len(dataloader) is 0

I have meet the same question

len(dataloader) is 0

> I have meet the same question i have solve this problem.you can change buck_config to smaller batch_size,and it can work.

len(dataloader) is 0

you can find it in /Open-Sora/configs/opensora-v1-2/train/stage1.py ``` bucket_config = { # 12s/it "144p": {1: (1.0, 475), 51: (1.0, 2), 102: ((1.0, 0.33), 2), 204: ((1.0, 0.1), 13), 408: ((1.0, 0.1),...

len(dataloader) is 0

> > buck_config > > pleas > > > > I have meet the same question > > > > > > i have solve this problem.you can change buck_config...

[BUG]: Low_Level_Zero plugin crashes with LoRA

Sorry to bother you, could you please describe it in more detail? Because I am using the 0.3.6 version of colossalai, I put the following code in the corresponding position...

[BUG]: Low_Level_Zero plugin crashes with LoRA

> Please share a minimum script to reproduce the error. Your code is wrong as _run_reduction reduces grads for all bucketed parameters. As far as I can tell, non-trainable params...

[BUG]: Low_Level_Zero plugin crashes with LoRA

> You can get the grads this way, described in the issue you mentioned [hpcaitech/Open-Sora#283 (comment)](https://github.com/hpcaitech/Open-Sora/issues/283#issuecomment-2185800300) I have read the above code before, but it did not involve zero_optizer in...

[BUG]: Low_Level_Zero plugin crashes with LoRA

> Does your training code involve an optimizer? That's what you're looking for Sorry to bother you again, I will refine my question. The following is a minimal reproduction of...