DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Can the program support longer answer_seq and prompt_seq lengths?

Open lljjgg opened this issue 2 years ago • 0 comments

I run the test program use "python train.py --actor-model facebook/opt-13b --reward-model facebook/opt-350m --num-gpus 8".The program can run normally.But I modified the parameter max_ answer_ seq_ len = 1024 and max_prompt_seq_len 1024 in run_1.3b.sh .The program reported an error. ** _Time to load utils op: 0.00037217140197753906 seconds ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /data/nfs/luojiangang/DeepSpeed/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/st │ │ ep3_rlhf_finetuning/main.py:516 in │ │ │ │ 513 │ │ 514 │ │ 515 if name == "main": │ │ ❱ 516 │ main() │ │ 517 │ │ │ │ /data/nfs/luojiangang/DeepSpeed/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/st │ │ ep3_rlhf_finetuning/main.py:425 in main │ │ │ │ 422 │ │ │ │ prompts = prompts[:, length - args.max_prompt_seq_len:] │ │ 423 │ │ │ │ raise ValueError("Prompt length is too long") │ │ 424 │ │ │ │ │ ❱ 425 │ │ │ out = trainer.generate_experience(prompts) │ │ 426 │ │ │ exp_dataset = exp_mini_dataset.add(out) │ │ 427 │ │ │ │ │ 428 │ │ │ if exp_dataset is not None: │ │ │ │ /data/nfs/luojiangang/DeepSpeed/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/st │ │ ep3_rlhf_finetuning/ppo_trainer.py:97 in generate_experience │ │ │ │ 94 │ │ │ 95 │ def generate_experience(self, prompts): │ │ 96 │ │ self.eval() │ │ ❱ 97 │ │ seq = self._generate_sequence(prompts) │ │ 98 │ │ self.train() │ │ 99 │ │ │ │ 100 │ │ pad_token_id = self.tokenizer.pad_token_id │ │ │ │ /data/nfs/luojiangang/DeepSpeed/DeepSpeedExamples-master/applications/DeepSpeed-Chat/training/st │ │ ep3_rlhf_finetuning/ppo_trainer.py:73 in _generate_sequence │ │ │ │ 70 │ │ max_min_length = self.max_answer_seq_len + prompts.shape[1] │ │ 71 │ │ │ │ 72 │ │ with torch.no_grad(): │ │ ❱ 73 │ │ │ seq = self.actor_model.module.generate(prompts, │ │ 74 │ │ │ │ │ │ │ │ │ │ │ │ max_length=max_min_length, │ │ 75 │ │ │ │ │ │ │ │ │ │ │ │ min_length=max_min_length) │ │ 76 │ /opt/conda/lib/python3.8/site-packages/deepspeed/runtime/hybrid_engine.py:258 in generate │ │ │ │ 255 │ │ │ │ │ 256 │ │ │ if len(self.all_lora_params) > 0: │ │ 257 │ │ │ │ if (not self.Z3_enabled): │ │ ❱ 258 │ │ │ │ │ self.unfuse_lora_weight() │ │ 259 │ │ │ │ else: │ │ 260 │ │ │ │ │ self.unfuse_lora_weight_non_pinned() │ │ 261 │ │ │ │ self.is_lora_fused = False │ │ │ │ /opt/conda/lib/python3.8/site-packages/deepspeed/runtime/hybrid_engine.py:144 in │ │ unfuse_lora_weight │ │ │ │ 141 │ │ │ 142 │ def unfuse_lora_weight(self): │ │ 143 │ │ for layer_id in range(len(self.layer_params)): │ │ ❱ 144 │ │ │ self._unfuse_lora(self.layer_params[layer_id], self.lora_params[layer_id]) │ │ 145 │ │ │ 146 │ def unfuse_lora_weight_non_pinned(self): │ │ 147 │ │ for layer_id in range(len(self.layer_params)): │ │ │ │ /opt/conda/lib/python3.8/site-packages/deepspeed/runtime/hybrid_engine.py:140 in _unfuse_lora │ │ │ │ 137 │ │ │ │ lora_right_weight, \ │ │ 138 │ │ │ │ lora_left_weight, \ │ │ 139 │ │ │ │ lora_scaling = lora_param │ │ ❱ 140 │ │ │ │ weight.data -= lora_scaling * torch.matmul(lora_left_weight.t(), lora_ri │ │ 141 │ │ │ 142 │ def unfuse_lora_weight(self): │ │ 143 │ │ for layer_id in range(len(self.layer_params)): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP) ** Is this a bug? Perhaps there are other ways for the program to support longer answer_seq and prompt_seq lengths?We look forward to your reply

lljjgg avatar Apr 14 '23 06:04 lljjgg