guozhiyao issues

Results 25 issues of


                                            guozhiyao

The model can not converge

I train the `swin_tiny_patch4_window7_224` with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the model...

Does it support the model wrapped by DistributedDataParallel？

How to avoid w less than 0?

Hi, I am a bit confused about the update process of `w`. In the paper, only the sum of `w` is constrained to be `task_num`, but it is not avoided...

Can torchmeta supports meta-curvature?

swin transformer can not converge with large trainset.

I train the tiny model with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the...

[BUG] multi-node training can not get state dict from `get_fp32_state_dict_from_zero_checkpoint`

**Describe the bug** I train the model with zero-2 for multi-node training, and save the model by `model.save_checkpoint`. When I want to get the state dict from `get_fp32_state_dict_from_zero_checkpoint`, it report...

bug

training

[BUG] ZERO3

**Describe the bug** I trained the gpt 13B model and used zero3, but it seems that the gpu usage will not decrease as the number of gpus increases? In addition,...

bug

training

guozhiyao