I trained the PPO model, use the gpt. I modified the option of model_name_or_path from opt to gpt2 I passed step 1 and step 2,But An error occurred in step 3.The error is as follows:
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/luojiangang/423_Deep/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_fin │
│ etuning/main.py:522 in │
│ │
│ 519 │
│ 520 │
│ 521 if name == "main": │
│ ❱ 522 │ main() │
│ 523 │
│ │
│ /data/luojiangang/423_Deep/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_fin │
│ etuning/main.py:431 in main │
│ │
│ 428 │ │ │ │ prompts = prompts[:, length - args.max_prompt_seq_len:] │
│ 429 │ │ │ │ raise ValueError("Prompt length is too long") │
│ 430 │ │ │ │
│ ❱ 431 │ │ │ out = trainer.generate_experience(prompts) │
│ 432 │ │ │ exp_dataset = exp_mini_dataset.add(out) │
│ 433 │ │ │ │
│ 434 │ │ │ if exp_dataset is not None: │
│ │
│ /data/luojiangang/423_Deep/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_fin │
│ etuning/ppo_trainer.py:97 in generate_experience │
│ │
│ 94 │ │
│ 95 │ def generate_experience(self, prompts): │
│ 96 │ │ self.eval() │
│ ❱ 97 │ │ seq = self._generate_sequence(prompts) │
│ 98 │ │ self.train() │
│ 99 │ │ │
│ 100 │ │ pad_token_id = self.tokenizer.pad_token_id │
│ │
│ /data/luojiangang/423_Deep/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_fin │
│ etuning/ppo_trainer.py:91 in _generate_sequence │
│ │
│ 88 │ │ │ │ continue │
│ 89 │ │ │ else: │
│ 90 │ │ │ │ out_seq.append(seq[i:i + 1]) │
│ ❱ 91 │ │ out_seq = torch.cat(out_seq, dim=0) # concate output in the batch dim │ │ 92 │ │ │ │ 93 │ │ return out_seq │ │ 94 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: torch.cat(): expected a non-empty list of Tensors
torch.Size([4, 50264])
torch.Size([4, 50264])
!!!! kernel execution error. (m: 2048, n: 4, k: 2048, error: 14)
!!!! kernel execution error. (m: 8192, n: 4, k: 2048, error: 13)
!!!! kernel execution error. (m: 2048, n: 4, k: 2048, error: 13)
Do you know what causes this? Can you provide the training steps for gpt2.Looking forward to your reply
Is the problem solved? friend
Same question! Have you found a solution?