Ma, Guokai issues

Results 19 issues of


                                            Ma, Guokai

[RFC] add device abstraction to allow other device than CUDA be used

This is a proposal to add device abstraction into DeepSpeed. Currently DeepSpeed has CUDA hard coded, which makes it works for device with CUDA abstraction only. In order to make...

pre-commit check for torch.cuda in code

This PR add checks for CUDA specific code in DeepSpeed. The purpose is to avoid accidental usage of CUDA code in new code. Two kinds of code are checked against:...

[CPU support] Optionally bind each rank to different cores on host

This PR add two command line options to `deepspeed` command to help support CPU as virtual accelerator, utilize vector or tensor computation provided by processor with AVX2/AVX512/AMX instruction set, which...

[CPU] Support Intel CPU inference

### Summary: This PR provides adds Intel CPU support to DeepSpeed by extending [DeepSpeedAccelerator](https://github.com/microsoft/DeepSpeed/blob/master/accelerator/abstract_accelerator.py) Interface. It allows user to run LLM inference with Intel CPU with Auto Tensor Parallelism or...

Documentation for DeepSpeed Accelerator Abstraction Interface

This PR add a document as a tutorial how to use DeepSpeed Accelerator Abstraction Interface to write accelerator agnostic DeepSpeed models; how to run a DeepSpeed model on different accelerator...

[profiling]add show_straggler argument to log_summary()

In tensor parallel inference straggler effect is one of the factor that impacts scaling efficiency. Between any two allreduce of tensor parallel, one worker may run slower than other workers,...

(Do not merge) (CPU) aggregation of few recent fixes/optimizations

This PR is aggregation of a few recent fixes inorder to support customer. This PR contains the following PRs with some other minor fixes: - [ ] Fix for moe...

[Bug fix] Fix cpu inference UT failure

This PR fix UT test error as described in this PR and the following test job. This PR skips `TestModelTask` if dtype is not supported by accelerator, or `InferenceBuilder` is...

benchmark-cublas.cu contains a bug

Matrix B is all -1 because the second rand() call (line 17) didn't convert to double before divide by integer

Workflow for AutoTP

This PR add a new extendable workflow for automatic tensor parallelism (https://www.deepspeed.ai/tutorials/automatic-tensor-parallelism/). The workflow aims to provide a way to validate AutoTP for LLM models.