DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Example models using DeepSpeed

Results 323 DeepSpeedExamples issues
Sort by recently updated
recently updated
newest added

When I run the demo ( step3_rlhf_finetuning/training_scripts/opt/single_node/run_1.3b.sh) without any change , the reward dose not increase. Is it normal? I would appreciate it if anyone can provide a normal reward...

When training the ppo model, I turned on the gradient_checkpointing_enable. If you want to calculate ptx loss, then actor will forward twice. In your code, these two loss are executed...

deespeed chat
new-config
modeling

Hi, I read the deepspeed docs and have the following confusion: (1) What's the difference between these methods when in inferencing LLMs? a. deepspeed.initialize and then write code to generate...

Fixed two issues: * Padding should be ignored in training. Their labels should be set to `-100` for `CrossEntropyLoss` to ignore them. * Append correct `eos_token` to the response text....

I am trying to run `training/HelloDeepSpeed` example on a fresh python virtualenv but getting below error. I have installed dependencies using https://github.com/microsoft/DeepSpeedExamples/blob/master/training/HelloDeepSpeed/requirements.txt ``` Traceback (most recent call last): File "/media/home/hemant/src/DeepSpeedExamples/training/HelloDeepSpeed/train_bert.py",...

I have benchmarked result of mii with the script of run_example.sh which located at "DeepSpeedExamples/benchmarks/inference/mii" in the repository, but it stalled as follows: ![image](https://github.com/microsoft/DeepSpeedExamples/assets/32950022/b681615b-0abc-4f6a-9123-6624989902e4) Then after a few minutes it...

**Problem:** When I got a previously-trained model state dict file, e.g., a reward model named `PATH/pytorch_model.bin`. When I try to reload it for further training using ZeRO3 optimizer, an error...

deespeed chat
new-config

When using where using hybrid engine, The output sequence always be 'a a a a ', while if I disabled hybrid engine,the output sequence is correct here is my log...

deespeed chat
hybrid engine

my training environment is a docker image pulled from `deepspeed/deepspeed:v072_torch112_cu117` and i run it with `docker run -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --network train-net --name fuyx-work -v...

bug
deespeed chat
hybrid engine

【Repo Link】: [Stable-Diffusion inference](https://github.com/microsoft/DeepSpeedExamples/tree/master/inference/huggingface/stable-diffusion) 【The Command used to run】: ```deepspeed --num_gpus 1 test-stable-diffusion.py``` 【Envs】: ``` RTX 3090 deepspeed 0.12.6 torch 1.13.1 diffusers 0.26.1 triton 2.0.0.dev20221202 ``` 【Traceback information】: ![image](https://github.com/microsoft/DeepSpeedExamples/assets/61218792/03846283-a343-4a79-8f8a-366300d5323e) It...