xuanhua
xuanhua
Hi, guys I find there is an assert failure when I train huggingface's lora based model in pipeline style. Here is the whole steps that I created my model: 1)...
**Describe the bug** I have two ubuntu machines, and with 10Gb/s erthnet cable connected and I want to use deepspeed to use these two machines to run a model training...
When I tried to install icetk by using `pip install icetk`, I could see icetk's version is 0.0.5. But when I go back to this code repo. I cannot find...
Hi, guys I have a m1-ultra mac studio and a linux box and I want to use both of them for distributed model training. But I found that gloo does...