Ma, Guokai
Ma, Guokai
> [@Antlera](https://github.com/Antlera) Thanks for this very detailed analysis! It gives good suggestion on what should be default value. Maybe make `ds_core_num` bigger when there are aboundant number of cores would...
> [@delock](https://github.com/delock) [@sfc-gh-truwase](https://github.com/sfc-gh-truwase) Some thoughts on the auto-tuning feature. Personally, I’d lean toward a simple script that runs a dummy model to stress the CPU side. Since the main goal...
> [@delock](https://github.com/delock) Did a very quick test in the slurm setting. It looks like the current soft fallback still has issues under Slurm. For example, I requested 32 CPU cores,...
I tried to emulate this situation by the following command: ``` taskset -c 0,4-11,21-26,30-46 deepspeed --num_gpus=2 finetune_llama.py --model_name Qwen/Qwen2.5-3B --output_dir output --lr 2e-5 --batch_size 8 --deepspeed_config zf_config.json --num_train_epochs 1 ```...
Thanks for the detail @Antlera . Let me read slurm docs to see if I can see any clue. If not, then lets add `taskset` as a pratical hint.
Does "DeepSpeed backend integration as the training engine for verl" means to be the default training engine for verl?
> https://www.deepspeed.ai/tutorials/automatic-tensor-parallelism/#supported-models, it looks like the qwen2.5 is not in the supported model list > > [@delock](https://github.com/delock), FYI Just verified that AutoTP support Qwen2.5-7B, the list should be updated. Will...
> I trained model qwen2.5-7B in GPU with llamafactory deepspeed zero2 + autotp. And there is no obvious memory reduction. > > When individually using zero2, the average memory of...
Hi @phalani-paladugu thanks for the suggestion. Agree that fallback with multinode should be added to support multi-node inference. For single node SHM, I notice that there are [RISCV implementation](https://github.com/deepspeedai/DeepSpeed/blob/b7cd78f096016ae67a11ef6292eba28e0452b4e7/csrc/cpu/comm/riscv64/shm.h), is...
The implementation of `cpu_arch()` tends to return `-march=native`, so `x86-64-v3` looks abnormal to me. Hi @Ali-Sayed-Salehi , some debugging into this function should reveal the exact line that returns `-march=x86-64-v3`,...