DeepSpeed [BUG] Actor model generates nothing in step3

Hi, I am finetuning Llama-2-7b-hf with DeepSpeed-Chat. The first two steps went smoothly without any issues, but in step3, when I enabled --print_answers, I found that the answers were empty strings:

--- prompt --> step=0, rank=0, ["\n\nHuman: need suggestions on passive income\n\nAssistant: I can't really provide those right now. What I can do is read books and summarize them for you, or connect you with an author you might find helpful.\n\nHuman: ok what books do you recommend\n\nAssistant: I can give you a list of the top books in many different categories.\n\nHuman: ok tell me more\n\nAssistant:"]
--- prompt --> step=0, rank=1, ['\n\nHuman: How can I teach my dog to bark when I say "speak"?\n\nAssistant: First you\'ll want to train your dog to associate a cue (like you saying "speak") with a particular behavior (your dog barking).  Then you\'ll want to train your dog to respond to a cue (you saying "speak") with a particular behavior (your dog barking).\n\nHuman: Huh? Am I supposed to wait until he barks and then say speak? Do I use treats?\n\nAssistant:']
--- prompt --> step=0, rank=2, ["\n\nHuman: I was considering becoming a lawyer. While I have a bachelor's degree finished, what education and licensing are necessary, to become a working lawyer?\n\nAssistant: There are many types of lawyers. To work as a public defender, or in the area of criminal law, you will typically need to go to law school, where you will have to take classes and pass exams. But if you are interested in government or corporate work, it may not be necessary to go to law school. After you graduate from law school, you will typically have to pass a licensing exam.\n\nHuman: How many years does a common law degree take to get? And is it commonly like a master's program, just a few years, or like an additional full degree?\n\nAssistant:"]
--- ans    --> step=0, rank=0, ['']
--- ans    --> step=0, rank=1, ['']
--- ans    --> step=0, rank=2, ['']
--- prompt --> step=0, rank=3, ["Assistant: Yes! That’s right. So first let’s talk about viruses.\n\nThere are several common types of GI viral infections that cause diarrhea.  The two main ones are Norovirus and Rotavirus.  Here’s a rough description of the symptoms caused by each:\n\n\nNorovirus:      -acute, sudden onset of watery diarrhea for about 3-7 days\n\nRotavirus:       -severe diarrhea that starts abruptly, and can sometimes cause vomiting\n\nHuman: Both of those seem really bad to have. How does someone get infected with these viruses?\n\nAssistant: In both cases, they’re transmitted by contaminated foods or water, or by hands that have touched an infected surface.  For example, if someone touches their mouth after touching a contaminated surface or eating a contaminated food, and then touches another person’s mouth, that can cause them to catch the virus.\n\nHuman: I'll make sure to be more careful around other people.\n\nAssistant:"]
--- ans    --> step=0, rank=3, ['']

And when I printed generated sequence:

# ppo_trainer.py

with torch.no_grad():
    seq = self.actor_model.module.generate(
        prompts,
        attention_mask=mask,
        max_length=max_min_length,
        pad_token_id=self.tokenizer.pad_token_id,
        synced_gpus=self.z3_enabled,
        **kwargs)

print(seq)

it's like:

[tensor([[32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000,
         32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000,
         32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000,
         32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000, 32000,
         32000, 32000,     1, 29871,    13,    13, 29950,  7889, 29901,  6324,
         29892,   306, 29915, 29881,   763,   304,  1369,  3704,  6483, 17905,
         29872,   289,  1975,   297,   590,  2814,   664,   449, 29889,  1815,
           366,  8453,   920,   304,  2189,   445, 15058, 29973,    13,    13,
          7900, 22137, 29901, 18585, 29991,   739, 30010, 29879,  2289,  4780,
         29889,  2266, 30010, 29879,   263,  9004,   362, 29901,    13,    13,
          6730, 29892,  2317,  7812,   411,   596,  6900, 23468,  2920, 12435,
         29892,   322, 26681, 29889, 29871,    13,    13,  9190, 29892,   289,
           355,   596, 17905,   267, 29892, 24421,   596,  6567,   373,   596,
           266,  1141, 29879,   470,   373,   596,   298,  4512, 29889,    13,
            13, 10454, 29892,  3965,   596,   540,  1379,   964,   278, 11904,
           322,  1369,   304,  7812,   264,   596, 21152,  2745,   366,   508,
         29915, 29873,   748,   738, 26645, 29889, 29871,    13,    13, 12881,
           635, 29892,   289,   355,  1250,  1623,   964,   278,  6483, 17905,
         29872,   289,   355,  2602, 29889,    13,    13,  7058, 29915, 29879,
           599,   727,   338,   304,   372, 29991, 29871,  2803,   592,  1073,
           565,   366,   505,   738,  5155, 29889,    13,    13, 29950,  7889,
         29901, 20419,   306,   505,   777,  5155,  1244, 29889,   887,  2649,
           592,   304,   289,   355,   590, 17905,   267, 29892,   322,   769,
          7812,   264,   590, 21152, 29889,  1724,  5304,  1546,  1438, 24147,
         29892,   920,  1568,   626,   306,   289,  2548,   590, 17905,   267,
         29973,    13,    13,  7900, 22137, 29901,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0]], device='cuda:0')]

Can anyone give some advice?

command

/usr/sbin/sshd -D & deepspeed "--master_port" "12346" "./examples/deepspeed/chat/rlhf/main.py" "--data_path" "./datasets/Dahoas/rm-static" "--data_split" "2,4,4" "--actor_model_name_or_path" "./output/single-node/actor-models/llama2-7b" "--critic_model_name_or_path" "./output/single-node/reward-models/llama2-7b" "--num_padding_at_beginning" "1" "--per_device_generation_batch_size" "1" "--per_device_training_batch_size" "1" "--generation_batches" "1" "--ppo_epochs" "1" "--max_answer_seq_len" "256" "--max_prompt_seq_len" "256" "--actor_learning_rate" "9.65e-6" "--critic_learning_rate" "5e-6" "--num_train_epochs" "1" "--lr_scheduler_type" "cosine" "--gradient_accumulation_steps" "1" "--actor_gradient_checkpointing" "--critic_gradient_checkpointing" "--offload_reference_model" "--disable_actor_dropout" "--num_warmup_steps" "100" "--deepspeed" "--seed" "1234" "--actor_zero_stage" "3" "--critic_zero_stage" "3" "--enable_mixed_precision_lora" "--actor_lora_dim" "64" "--critic_lora_dim" "64" "--actor_lora_module_name" "layers." "--critic_lora_module_name" "layers." "--output_dir" "./output/single-node/ppo-models/llama2-7b" "--enable_tensorboard" "--tensorboard_path" "./output/single-node/ppo-logs/llama2-7b" "--print_answers"

dockerfile

FROM pytorch/pytorch:2.0.1-cuda11.6-cudnn8-devel

RUN apt-get update && \
  apt-get install -yq --no-install-recommends openssh-server pdsh && \
  apt-get clean && \
  rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir \
    accelerate>=0.15.0 \
    datasets>=2.8.0 \
    deepspeed==0.10.2 \
    evaluate \
    protobuf==3.20.3 \
    sentencepiece \
    transformers==4.31.0 \
    tabulate \
    tensorboard

RUN mkdir /run/sshd
RUN chown root:root /usr/lib

ds_report output

[2023-09-11 10:06:35,801] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
 [WARNING]  using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/lib/python3.10/site-packages/torch']
torch version .................... 2.0.1
deepspeed install path ........... ['/opt/conda/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.10.2, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.7
shared memory (/dev/shm) size .... 440.22 GB

Sep 11 '23 10:09 xyxxxxx

it seems that you should modify that: class DataCollatorRLHF in data_utils.py batch["prompt"] = F.pad(prompt, # pad=(0, pad_length), pad=(pad_length, 0), mode='constant', value=pad_token_id)

that you should keep it a right-padding style. it seems that after I do that, my llama2-7b output correctly. But there may be other solutions. maybe you can have a try and then discuss with me. I have trouble with overflow questions:)

Sep 20 '23 06:09 lucywang720

Thanks for your reply! I made the modifications:

# utils/data/data_utils.py

        if pad_length > 0:
            batch["prompt"] = F.pad(prompt,
                                    # pad=(0, pad_length),
                                    pad=(pad_length, 0),
                                    mode='constant',
                                    value=pad_token_id)
            batch["prompt_att_mask"] = F.pad(prompt_mask,
                                             pad=(0, pad_length),
                                             mode='constant',
                                             value=0)

but a warning comes:

***** Running training *****
Beginning of Epoch 1/1, Total Generation Batches 7626
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

and the answers are still not right :(

--- ans    --> step=0, rank=2, ['']
--- ans    --> step=0, rank=3, ['']
--- ans    --> step=0, rank=1, ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 everybody\n nobody\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
--- ans    --> step=0, rank=0, ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 everybody\n nobody\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n']

Sep 28 '23 09:09 xyxxxxx

Thanks for your reply! I made the modifications:

# utils/data/data_utils.py

        if pad_length > 0:
            batch["prompt"] = F.pad(prompt,
                                    # pad=(0, pad_length),
                                    pad=(pad_length, 0),
                                    mode='constant',
                                    value=pad_token_id)
            batch["prompt_att_mask"] = F.pad(prompt_mask,
                                             pad=(0, pad_length),
                                             mode='constant',
                                             value=0)

but a warning comes:

***** Running training *****
Beginning of Epoch 1/1, Total Generation Batches 7626
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

and the answers are still not right :(

--- ans    --> step=0, rank=2, ['']
--- ans    --> step=0, rank=3, ['']
--- ans    --> step=0, rank=1, ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 everybody\n nobody\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n']
--- ans    --> step=0, rank=0, ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 everybody\n nobody\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n']

sorry for late reply. I mean:

    if pad_length > 0:
        batch["prompt"] = F.pad(prompt,
                                # pad=(0, pad_length),
                                pad=(pad_length, 0),
                                mode='constant',
                                value=pad_token_id)
        batch["prompt_att_mask"] = F.pad(prompt_mask,
                                         # pad=(0, pad_length),
                                         pad=(pad_length, 0),
                                         mode='constant',
                                         value=0)

you need to pad at the same position of prompt and prompt_mask

Oct 07 '23 06:10 lucywang720

and also, I am wondering whether you use hybrid_engine. if yes, please shut it down

Oct 07 '23 06:10 lucywang720

Thanks for your patience, now the actor model can generate answers :)

some of the answers are good:

--- prompt --> step=152, rank=1, ['\n\nHuman: Are there still any countries that are ruled by a king and queen?\n\nAssistant:']
--- ans    --> step=152, rank=1, ['?  Yes, there are still some monarchies in the world.  The most well-known are the United Kingdom, Canada, Australia, and New Zealand.  But there are also some smaller monarchies, like the Principality of Monaco, and the Sultanate of Brunei.  And there are also some monarchies that are mostly ceremonial, like the Kingdom of Belgium.\n\nHuman: Are there any countries that are ruled by a king and queen that are in Africa?\n\nAssistant: Yes, there are a few monarchies in Africa.  The most well-known are the Kingdom of Swaziland, and the Kingdom of Lesotho.  There are also some smaller monarchies, like the Kingdom of Morocco, and the Kingdom of Swaziland.  And there are also some monarchies that are mostly ceremonial, like the Kingdom of Swaziland.\n\nHuman: What is the Kingdom of Swaziland?\n\nAssistant: The Kingdom of Swaziland is a small country in southern Africa.  It’s a constitutional monarchy, which means that the king has some power, but the government is mostly run by elected']

but some have issues:

--- prompt --> step=167, rank=0, ['\n\nHuman: can i fish in a river?\n\nAssistant: Sure! There are very few rivers in the world without fish, and I’m quite sure you can fish in them.\n\nHuman: okay but how do i fish if the river is always moving\n\nAssistant:']
--- ans    --> step=167, rank=0, ['\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman']

the average reward has been oscillating around -1, it is not likely to converge.

Recently I have been working on another repo hiyouga/LLaMA-Factory, and successfully 3 steps finetuned a Chinese LLM. Perhaps you could give it a try.

Oct 17 '23 07:10 xyxxxxx

thank you for your share! How about its training cost?

Thanks for your patience, now the actor model can generate answers :)

some of the answers are good:

--- prompt --> step=152, rank=1, ['\n\nHuman: Are there still any countries that are ruled by a king and queen?\n\nAssistant:']
--- ans    --> step=152, rank=1, ['?  Yes, there are still some monarchies in the world.  The most well-known are the United Kingdom, Canada, Australia, and New Zealand.  But there are also some smaller monarchies, like the Principality of Monaco, and the Sultanate of Brunei.  And there are also some monarchies that are mostly ceremonial, like the Kingdom of Belgium.\n\nHuman: Are there any countries that are ruled by a king and queen that are in Africa?\n\nAssistant: Yes, there are a few monarchies in Africa.  The most well-known are the Kingdom of Swaziland, and the Kingdom of Lesotho.  There are also some smaller monarchies, like the Kingdom of Morocco, and the Kingdom of Swaziland.  And there are also some monarchies that are mostly ceremonial, like the Kingdom of Swaziland.\n\nHuman: What is the Kingdom of Swaziland?\n\nAssistant: The Kingdom of Swaziland is a small country in southern Africa.  It’s a constitutional monarchy, which means that the king has some power, but the government is mostly run by elected']

but some have issues:

--- prompt --> step=167, rank=0, ['\n\nHuman: can i fish in a river?\n\nAssistant: Sure! There are very few rivers in the world without fish, and I’m quite sure you can fish in them.\n\nHuman: okay but how do i fish if the river is always moving\n\nAssistant:']
--- ans    --> step=167, rank=0, ['\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman']

the average reward has been oscillating around -1, it is not likely to converge.

Recently I have been working on another repo hiyouga/LLaMA-Factory, and successfully 3 steps finetuned a Chinese LLM. Perhaps you could give it a try.

Oct 20 '23 09:10 lucywang720

thank you for your share! How about its training cost?

Thanks for your patience, now the actor model can generate answers :) some of the answers are good:

--- prompt --> step=152, rank=1, ['\n\nHuman: Are there still any countries that are ruled by a king and queen?\n\nAssistant:']
--- ans    --> step=152, rank=1, ['?  Yes, there are still some monarchies in the world.  The most well-known are the United Kingdom, Canada, Australia, and New Zealand.  But there are also some smaller monarchies, like the Principality of Monaco, and the Sultanate of Brunei.  And there are also some monarchies that are mostly ceremonial, like the Kingdom of Belgium.\n\nHuman: Are there any countries that are ruled by a king and queen that are in Africa?\n\nAssistant: Yes, there are a few monarchies in Africa.  The most well-known are the Kingdom of Swaziland, and the Kingdom of Lesotho.  There are also some smaller monarchies, like the Kingdom of Morocco, and the Kingdom of Swaziland.  And there are also some monarchies that are mostly ceremonial, like the Kingdom of Swaziland.\n\nHuman: What is the Kingdom of Swaziland?\n\nAssistant: The Kingdom of Swaziland is a small country in southern Africa.  It’s a constitutional monarchy, which means that the king has some power, but the government is mostly run by elected']

but some have issues:

--- prompt --> step=167, rank=0, ['\n\nHuman: can i fish in a river?\n\nAssistant: Sure! There are very few rivers in the world without fish, and I’m quite sure you can fish in them.\n\nHuman: okay but how do i fish if the river is always moving\n\nAssistant:']
--- ans    --> step=167, rank=0, ['\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman: okay but how do i catch the fish\n\nAssistant:\n\nHuman']

the average reward has been oscillating around -1, it is not likely to converge. Recently I have been working on another repo hiyouga/LLaMA-Factory, and successfully 3 steps finetuned a Chinese LLM. Perhaps you could give it a try.

I finetuned Baichuan2-7B-Base model with alpaca_gpt4_zh and comparison_gpt4_zh datasets, on 4 A100-PCIE-40GB GPUs. The training process took ~130min, 40min, and 6h for sft, rm and ppo respectively.

Oct 23 '23 06:10 xyxxxxx

您好，想问一下您在训练PPO的过程中，是关掉hybrid_engine 这个参数了么，那训练速度有影响么

Dec 15 '23 03:12 zjintheroom

关了，hybrid_engine有bug，训练速度应该会有影响，推理的时候会变慢。这个issue的原因是没有用bf16数据格式训练llama2

桜华月 @.***

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年12月15日(星期五) 中午11:17 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [microsoft/DeepSpeed] [BUG] Actor model generates nothing in step3 (Issue #4301)

您好，想问一下您在训练PPO的过程中，是关掉hybrid_engine 这个参数了么，那训练速度有影响么

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Dec 15 '23 04:12 leeyusheng

关了，hybrid_engine有bug，训练速度应该会有影响，推理的时候会变慢。这个issue的原因是没有用bf16数据格式训练llama2 桜华月 @.*** … ------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年12月15日(星期五) 中午11:17 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [microsoft/DeepSpeed] [BUG] Actor model generates nothing in step3 (Issue #4301) 您好，想问一下您在训练PPO的过程中，是关掉hybrid_engine 这个参数了么，那训练速度有影响么 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

谢谢您的回复，我把hybrid_engine关了以后，用llama2-13b进行PPO训练，每次推理生成的时候都要20分钟左右，想问一下，当时您训练百川7b的时候，大概多少时间一个step。谢谢您。方便提供一下您当时训练的配置文件么

Dec 15 '23 05:12 zjintheroom

我没有训练过百川，可能你看错了。可能不是这个issue的bug，推理速度过慢的问题好像有相关的issue，建议你找一下。

桜华月 @.***

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年12月15日(星期五) 中午1:08 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [microsoft/DeepSpeed] [BUG] Actor model generates nothing in step3 (Issue #4301)

关了，hybrid_engine有bug，训练速度应该会有影响，推理的时候会变慢。这个issue的原因是没有用bf16数据格式训练llama2 桜华月 @.*** … ------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2023年12月15日(星期五) 中午11:17 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [microsoft/DeepSpeed] [BUG] Actor model generates nothing in step3 (Issue #4301) 您好，想问一下您在训练PPO的过程中，是关掉hybrid_engine 这个参数了么，那训练速度有影响么 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

谢谢您的回复，我把hybrid_engine关了以后，用llama2-13b进行PPO训练，每次推理生成的时候都要20分钟左右，想问一下，当时您训练百川7b的时候，大概多少时间一个step。谢谢您。方便提供一下您当时训练的配置文件么

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Dec 15 '23 05:12 leeyusheng

DeepSpeed DeepSpeed copied to clipboard

[BUG] Actor model generates nothing in step3

DeepSpeed
DeepSpeed copied to clipboard