DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Results 1165 DeepSpeed issues
Sort by recently updated
recently updated
newest added

1. `Tensor.bmm` missed in `_patch_tensor_methods` function 2. missed funtions in `_reload_functionals` and `_reload_tensor_methods` functions 3. `torch.mm` and `torch.Tensor.mm` will have same `__name__` in `wrapFunc`, my suggustion is use `__str__` instead....

With the recent release of torch 1.12, we saw all unit tests using the `@distirbuted_test` decorator break (see [this issue](https://github.com/pytorch/pytorch/issues/68256)). The problem involves changes in `torch.multiprocessing` and `torch.distributed` that prevents...

My code is quite similar to some GNN structure : [NN_output = graph.forward(NN_input, types="f")](https://gist.github.com/buttercutter/b6f526c56e20f029d68e6f9041c3f5c0/3d1a3e6844680545fb8b75225267ea625ba9df5b#file-gdas-py-L665) So, [outputs = model_engine(inputs)](https://github.com/microsoft/DeepSpeedExamples/blob/36212dd59cb3eb342c39bc8965aaba04d5491933/cifar/cifar10_deepspeed.py#L281) seems does not really fit in my case ? `args` also does...

**Describe the bug** DeBERTa has bad performance when using ZERO Stage-3 . stdout has continuous warnings ```bash [stage3.py:104:_apply_to_tensors_only] A module has unknown inputs or outputs type () and the tensors...

bug

I currently have some tests on Zero3 infinite and have had some problems and would like your help. **Machine configuration**: two nodes, each node a piece of A100-PCIE-40GB, RAM 126G...

**Describe the bug** pip install not working for windows 10 **To Reproduce** Steps to reproduce the behavior: 1. Go to Command Prompt 2. Type in 'pip install deepspeed' 3. Voila...

bug

**Describe the bug** Running a forward pass on a `DeepSpeedTransformerInference` layer, with a sequence length of ~1000 tokens, results in an illegal memory access CUDA error. **To Reproduce** Here is...

bug
inference

**Describe the bug** Running a forward pass on a DeepSpeedTransformerInference layer, with a sequence length of ~1000 tokens, results in a Device Index Runtime error. **To Reproduce** Here is a...

bug
inference

Right now OPT (https://huggingface.co/docs/transformers/model_doc/opt) can only be supported via custom kernel injection policy. It would be great if there's official support. Thanks!

enhancement
inference