Ma, Guokai

Results 19 issues of Ma, Guokai

This is a proposal to add device abstraction into DeepSpeed. Currently DeepSpeed has CUDA hard coded, which makes it works for device with CUDA abstraction only. In order to make...

This PR add checks for CUDA specific code in DeepSpeed. The purpose is to avoid accidental usage of CUDA code in new code. Two kinds of code are checked against:...

This PR add two command line options to `deepspeed` command to help support CPU as virtual accelerator, utilize vector or tensor computation provided by processor with AVX2/AVX512/AMX instruction set, which...

### Summary: This PR provides adds Intel CPU support to DeepSpeed by extending [DeepSpeedAccelerator](https://github.com/microsoft/DeepSpeed/blob/master/accelerator/abstract_accelerator.py) Interface. It allows user to run LLM inference with Intel CPU with Auto Tensor Parallelism or...

This PR add a document as a tutorial how to use DeepSpeed Accelerator Abstraction Interface to write accelerator agnostic DeepSpeed models; how to run a DeepSpeed model on different accelerator...

In tensor parallel inference straggler effect is one of the factor that impacts scaling efficiency. Between any two allreduce of tensor parallel, one worker may run slower than other workers,...

This PR is aggregation of a few recent fixes inorder to support customer. This PR contains the following PRs with some other minor fixes: - [ ] Fix for moe...

This PR fix UT test error as described in this PR and the following test job. This PR skips `TestModelTask` if dtype is not supported by accelerator, or `InferenceBuilder` is...

Matrix B is all -1 because the second rand() call (line 17) didn't convert to double before divide by integer

This PR add a new extendable workflow for automatic tensor parallelism (https://www.deepspeed.ai/tutorials/automatic-tensor-parallelism/). The workflow aims to provide a way to validate AutoTP for LLM models.