Giyeong Oh

Results 40 comments of Giyeong Oh

or give argument `--save_state` at the begining of training from scratch, and use this argument for resuming `--resume="STATE_IN_OUTPUT_DIR"` These option utilize saving and loading optimizer state and weight.

> The error was not caused by DDP multi-gpu training, but by DeepSpeed... The original DDP multi-gpu training was fine with Flux Lora training, but as long as you installed...

> > Updated the sd3 branch. Multi-GPU training should now work. Please report again if the issue remains. > > I have four A100-40G,Is it feasible to train flux model...

> > > > Updated the sd3 branch. Multi-GPU training should now work. Please report again if the issue remains. > > > > > > > > > I...

> @BootsofLagrangian Hi there, hope that i can reach out to you. I also get this dtype error when training flux lora with deepspeed multigpu. Do you have any updates...

> > [@terrificdm](https://github.com/terrificdm) With the RTX3090(24GB), image resolution=1024 condition, the multi-gpu Flux finetuning is OOM, have you ever occur the same problem? Thanks a lot!My config script is : ![image](https://private-user-images.githubusercontent.com/26437644/387115225-322abcbc-04f8-4ba6-915a-882fb840f8d8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzIyOTI1OTgsIm5iZiI6MTczMjI5MjI5OCwicGF0aCI6Ii8yNjQzNzY0NC8zODcxMTUyMjUtMzIyYWJjYmMtMDRmOC00YmE2LTkxNWEtODgyZmI4NDBmOGQ4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTIyVDE2MTgxOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWM3ZjgyYTRhOTcyNmFiNTFmODZlZjFmZjFiMTliNGRlYzhhZDFlZTQxMzI4NmUxNDU3ODgzNzc5ZWY5MGE1MDImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Mknh1w_GtZTJfq7mg7LAxOrQkOlhlovRmKohvMjymNs)...

It supports on sd-3 branch. If you have trouble logs, please attach them.

> > Unfortunately multi GPU training of FLUX has not been tested yet. `--split_mode` doesn't seem to work with multi GPU training. > > The current single-card training is indeed...

> Has anyone managed to get it working properly? I have 4 GPUs, but it still only runs on the first GPU. ![image](https://private-user-images.githubusercontent.com/78720117/362176580-29c61ce0-a6f5-4f51-b678-fe66533c2cf0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjQ4MzM1MjcsIm5iZiI6MTcyNDgzMzIyNywicGF0aCI6Ii83ODcyMDExNy8zNjIxNzY1ODAtMjljNjFjZTAtYTZmNS00ZjUxLWI2NzgtZmU2NjUzM2MyY2YwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODI4VDA4MjAyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWViNjljNGUxMDI5YmQ0NWM4Njk5ZGNiNTQ1MzU4ODU1ZDc1MzBkYzdmOTIwOWQyMWFlODg3YmU1OWFlYzBkNDMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.eDivIp1hK05cKojnHc0Na74fPLI3pzqONLtEHOwoxyM) Is trainer in caching process?

> > > Has anyone managed to get it working properly? I have 4 GPUs, but it still only runs on the first GPU. ![image](https://private-user-images.githubusercontent.com/78720117/362176580-29c61ce0-a6f5-4f51-b678-fe66533c2cf0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjQ4MzM1MjcsIm5iZiI6MTcyNDgzMzIyNywicGF0aCI6Ii83ODcyMDExNy8zNjIxNzY1ODAtMjljNjFjZTAtYTZmNS00ZjUxLWI2NzgtZmU2NjUzM2MyY2YwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODI4VDA4MjAyN1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWViNjljNGUxMDI5YmQ0NWM4Njk5ZGNiNTQ1MzU4ODU1ZDc1MzBkYzdmOTIwOWQyMWFlODg3YmU1OWFlYzBkNDMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.eDivIp1hK05cKojnHc0Na74fPLI3pzqONLtEHOwoxyM) > > > > >...