DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Example models using DeepSpeed

Results 274 DeepSpeedExamples issues
Sort by recently updated
recently updated
newest added

Hi, I following the script to train the bloom model for my own dataset. However, I found that it saved the model differently as compared to other models such as...

Hello, I greatly appreciate the RLHF repository you have provided. Previously, I was using trlx, but after switching to this repository, my main concern is about experiment logging and evaluation....

> ``` > # Copyright (c) Microsoft Corporation. > # SPDX-License-Identifier: Apache-2.0 > > # DeepSpeed Team > > > ACTOR_ZERO_STAGE="--actor_zero_stage 0" > CRITIC_ZERO_STAGE="--critic_zero_stage 0" > ACTOR_MODEL_PATH="EleutherAI/polyglot-ko-5.8b_base_model" > CRITIC_MODEL_PATH="gpt2-medium_base_RM" >...

When I modified “run_example.sh” and changed backend to vllm, I got the error message down below, I will do some some check whether the error comes from server side or...

This PR add a --client-only flag to mii benchmark, allows the benchmark skip `start_server` and `stop_server` when running with backend such as vllm. This flag provide the flexibility to start...

I have put the `Dahous/rm-static` dataset as well as the the model `facebook/opt-1.3b` under the path **DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning** When running the command `bash training_scripts/opt/single_gpu/run_1.3b.sh` It seems there are some troubles loading...

Error message: ```sh localhost: ssh: connect to host localhost port 22: Connection refused pdsh@mla****-worker: localhost: ssh exited with exit code 255 [2024-04-20 17:29:09,147] [INFO] [scheduler.py:430:clean_up] Done cleaning up exp_id =...

slurm command as following : ``` #!/bin/bash #SBATCH --job-name=pretrain_7 # name #SBATCH --nodes=2 # nodes #SBATCH -w server-gpu-[10,15] #SBATCH --ntasks-per-node=1 # crucial - only 1 task per dist per node!...