DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Example models using DeepSpeed

Results 323 DeepSpeedExamples issues
Sort by recently updated
recently updated
newest added

In the reward model implementation, I noticed these two lines of code, ` c_truncated_reward = chosen_reward[divergence_ind:end_ind] r_truncated_reward = rejected_reward[divergence_ind:end_ind]` It should take the answer part, but chosen and rejected take...

bug
deespeed chat

When running step 3 with ZERO stage 3 enabled for both the actor and critic models, I get the following error (line numbers may be offset due to debug statements...

bug
deespeed chat

![98DDB13F-60AE-4F7D-8979-9B287A2A4CC1](https://user-images.githubusercontent.com/39515647/233412075-f68a9c2b-24c8-426c-80d3-6f2c0e48b1ca.png)

deespeed chat
hybrid engine

Hi, I have finished training the following models: facebook/opt-1.3b (step 1,2 and 3) facebook/opt-6.7b (step 1) **Here is the performance shown at the bottom of the chatbot.py script:** ``` Human:...

bug
deespeed chat

``` File "main.py", line 334, in main save_hf_format(model, tokenizer, args) File ".../applications/DeepSpeed-Chat/training/utils/utils.py", line 51, in save_hf_format os.makedirs(output_dir) File "/usr/lib/python3.8/os.py", line 223, in makedirs mkdir(name, mode) FileExistsError: [Errno 17] File exists:...

I am trying to run step 3 of the RLHF examples using a RewardModel checkpoint that I trained using step 2 of the examples. For every step, I used the...

I was using script from step3_rlhf_finetuning/training_scripts/single_node/run_6.7b.sh, I met some errors. I used 7B Llama models as actor and critic respectively and set enable_hybrid_engine argument, I got errors like below: │...

In `applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py`, `critic_loss` and `actor_loss` are strangely added to each other. I am so confused about it.

I am trying to run DeepSpeed-Chat Example with single gpu, Nvidia A6000 48G. I could run all 3 steps well using 1.3b example. But when I run `single_gpu/run_6.7b_lora.sh`, I got...

question
deespeed chat

**this is the error from training.log:** > Traceback (most recent call last): File "/data/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 339, in main() File "/data/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py", line 271, in main optimizer = AdamOptimizer(optimizer_grouped_parameters, File "/home/ps/anaconda3/envs/pt/lib/python3.10/site-packages/deepspeed/ops/adam/fused_adam.py", line...