Ethan Smith comments

Results 10 comments of


                                            Ethan Smith

[BUG] ZeRO3 - Getting assert len(self.ckpt_list) > 0 while running validation code during fine tuning

im finding some of my processes are able to load the model states while others fail to do so edit: in my case i realized i am going from one...

[BUG] assert all_groups_norm > 0 | Error related to Bf16 optimizer it seems

I think the error suggests vanishing gradient, but it's strange that I don't see it when using fp16 or full precision

Tokenization slows towards end of dataset

I did see some comments about how num_proc=None could help and outputting numpy arrays can also help in the docs, but this seems quite odd now dropping down to 1it/s...

[BUG]Traning multiple model with deepspeed

is there any reason I would see this if training a single model? And only occuring with fp16, bf16 and fp32 do not result in this error

When is the training and inference code going to be released?

hey @artykov1511 see UnCLIPPipeline in diffusers which uses the same methodology of projection onto timestep embeddings and extra context tokens :)

If I am optimizing more than one model, do I need to write "with accelerator.accumulate(model1) and accelerator.accumulate(model2)"?

@muellerzr Thank you

Ethan Smith