Ma, Guokai
Ma, Guokai
Better core binding in torch.backends.xeon.run_cpu when launced from torchrun with --nproc-per-node
This PR fix `torch.backends.xeon.run_cpu` behavior when it is launched from `torchrun` with `--nproc-per-node` parameter. As a CPU launcher, `run_cpu` would bind cores to each instance it launches using `numactl`, and...
I met this when I run evaluation with small parallel number of games: `Ran 0 batches with an average size of -nan ` Is it possible that a ModelBatcher might...
Currently it is a magic number (0.55) in https://github.com/tensorflow/minigo/blob/master/ml_perf/eval_models.py#L47
@abhilash1910 is XPU support for qlora still working? I tried to run it on a linux arc770 system at home but got the following error: $ python qlora.py --model_name_or_path facebook/opt-350m...
This PR add a --client-only flag to mii benchmark, allows the benchmark skip `start_server` and `stop_server` when running with backend such as vllm. This flag provide the flexibility to start...
This PR adds a new client which can test performance of LLM serving conforms to OpenAI API. This gives the flexibility of start a server seperately and benchmark that server...
This issue acted as a PR tracker to Intel customer support related PRs. The purpose is to get understanding of what each PR does and how important are they compared...
I'm wondering if we can take the ZenFlow finetuning example, and extend this example into a test bed of different DeepSpeed technologies. The ZenFlow finetuning example: https://github.com/deepspeedai/DeepSpeedExamples/tree/master/training/DeepSpeed-ZenFlow/finetuning The reason is...
**Describe the bug** DeepSpeedZeroOptimizer_Stage3 and SuperOffloadOptimizer_Stage3 shares same parameter list, which would cause divergence easily ** Details ** In https://github.com/deepspeedai/DeepSpeed/blob/b7cd78f096016ae67a11ef6292eba28e0452b4e7/deepspeed/runtime/engine.py#L1846 , `DeepSpeedZeroOptimizer_Stage3` and `SuperOffloadOptimizer_Stage3` initializer shares same parameter list. This...