matrixssy

Results 15 comments of matrixssy
trafficstars

Encountered the same issue! Training ALPACA_LORA-13B.

> Mixtral 可能不支持 ZeRO3 在deepspeed chat架构上我也发现,zero3开启后显存似乎没有正常切分,导致最后OOM

我这边不是卡住而是更诡异的报错 ``` [INFO|trainer.py:1709] 2023-12-29 09:19:48,906 >> ***** Running training ***** [INFO|trainer.py:1710] 2023-12-29 09:19:48,906 >> Num examples = 64,000 [INFO|trainer.py:1711] 2023-12-29 09:19:48,906 >> Num Epochs = 9,223,372,036,854,775,807 [INFO|trainer.py:1712] 2023-12-29 09:19:48,906 >>...

> Great work! Could you provide a script of convert megatron mixtral to hf ? Still working on it.

> Hi, I wonder if the loss is normal after converting and training mixtral with megatron at your computer. I apply this PR and the initial loss is quite high,...

> Hi, I wonder if the loss is normal after converting and training mixtral with megatron at your computer. I apply this PR and the initial loss is quite high,...

> Hi, I fix a bug in my script and now the initial loss is normal *(around 2.3 in arxiv dataset). Thanks for your contribution! > > also, I have...

> Hi, @matrixssy. Thanks for your contribution, there are some ongoing efforts in NVIDIA internally working on the Mixtral 8x7B example. We will support convert HF checkpoint to MCore checkpoint...

> Hi, when I set target-tensor-parallel-size > 1 , I got the following errors. only setting target-tensor-parallel-size = 1 works. Is it possible that it is related to the following...