Tingfeng Lan comments

Results 23 comments of


                                            Tingfeng Lan

[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow

## Benchmark on CPU binding methods Hi @delock. Thanks for comfirming the command. Please see my benchmark for CPU core binding and overhead breakdown. The overhead is shown as the...

[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow

Another quick comment for the potential bugs. The default `--bind_to_rank` implementation using `numactl` can be problematic for Slurm users, since they only have access rights to a subset of cores...

[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow

Hi @delock. For your logs and questions. The transmit throughput looks a bit slow here, only around 12 GB/s. From my side I usually see ~200 ms for this stage...

[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow

> > [@Antlera](https://github.com/Antlera) Thanks for this very detailed analysis! It gives good suggestion on what should be default value. Maybe make `ds_core_num` bigger when there are aboundant number of cores...

[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow

@delock Thanks for implementing the soft fallback in (#7506). I’ll run a quick test on it soon.

[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow

@delock @sfc-gh-truwase Some thoughts on the auto-tuning feature. Personally, I’d lean toward a simple script that runs a dummy model to stress the CPU side. Since the main goal is...

[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow

@delock Did a very quick test in the slurm setting. It looks like the current soft fallback still has issues under Slurm. For example, I requested 32 CPU cores, but...

[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow

Maybe for the fallback case it would be safer to base the core split on the CPUs visible to the current process, (e.g. `num_cores = len(psutil.Process().cpu_affinity()`) instead of relying on...

[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow

@delock I used `deepspeed --num_gpus=$GPUS_PER_NODE --master_port $MASTER_PORT finetune_llama.py`. Let me double check I am at the right branch head.

[REQUEST] Auto-Tuning CPU Core Binding for DeepSpeed&ZenFlow

I am currently at commit `744399e` [Merge branch 'master' into gma/zenflow_affinity](https://github.com/deepspeedai/DeepSpeed/pull/7506/commits/744399e096313e7f0eb18026e503ce3a6cf81829). > [@delock](https://github.com/delock) I used `deepspeed --num_gpus=$GPUS_PER_NODE --master_port $MASTER_PORT finetune_llama.py`. Let me double check I am at the right branch...