ColossalAI [BUG]: PPO errors

🐛 Describe the bug

When I train the stage3（PPO） in chat , the following error occurs： /home/zzg/workspace/pycharm/ColossalAI/applications/Chat/examples/train_prom │ │ pts_jp.py:303 in │ │ │ │ 300 │ parser.add_argument('--max_datasets_size', type=int, default=None) │ │ 301 │ parser.add_argument('--max_len', type=int, default=512) │ │ 302 │ args = parser.parse_args() │ │ ❱ 303 │ main(args) │ │ 304 │ │ │ │ /home/zzg/workspace/pycharm/ColossalAI/applications/Chat/examples/train_prom │ │ pts_jp.py:259 in main │ │ │ │ 256 │ │ eos_token_id=tokenizer_actor.eos_token_id, │ │ 257 │ ) │ │ 258 │ │ │ ❱ 259 │ trainer.fit(prompt_dataloader=prompt_dataloader, │ │ 260 │ │ │ │ pretrain_dataloader=pretrain_dataloader, │ │ 261 │ │ │ │ num_episodes=args.num_episodes, │ │ 262 │ │ │ │ max_timesteps=args.max_timesteps, │ │ │ │ /home/zzg/miniconda3/envs/py39_DL_cu118/lib/python3.9/site-packages/coati/tr �� │ ainer/base.py:125 in fit │ │ │ │ 122 │ │ │ │ if time % update_timesteps == 0: │ │ 123 │ │ │ │ │ self.experience_maker.initial_model.to('cpu') │ │ 124 │ │ │ │ │ self.experience_maker.reward_model.to('cpu') │ │ ❱ 125 │ │ │ │ │ self._learn() │ │ 126 │ │ │ │ │ self.replay_buffer.clear() │ │ 127 │ │ │ self._on_episode_end(episode) │ │ 128 │ │ self._on_fit_end() │ │ │ │ /home/zzg/miniconda3/envs/py39_DL_cu118/lib/python3.9/site-packages/coati/tr │ │ ainer/base.py:93 in _learn │ │ │ │ 90 │ │ │ │ for experience in pbar: │ │ 91 │ │ │ │ │ self._on_learn_batch_start() │ │ 92 │ │ │ │ │ experience.to_device(device) │ │ ❱ 93 │ │ │ │ │ metrics = self.training_step(experience) │ │ 94 │ │ │ │ │ self._on_learn_batch_end(metrics, experience) │ │ 95 │ │ │ │ │ pbar.set_postfix(metrics) │ │ 96 │ │ │ │ self._on_learn_epoch_end(epoch) │ │ │ │ /home/zzg/miniconda3/envs/py39_DL_cu118/lib/python3.9/site-packages/coati/tr │ │ ainer/ppo.py:103 in training_step │ │ │ │ 100 │ │ │ label = batch['labels'].to(torch.cuda.current_device())[:, │ │ 101 │ │ │ attention_mask = batch['attention_mask'].to(torch.cuda.cur │ │ 102 │ │ │ ptx_log_probs = self.actor.get_base_model()(ptx, attention │ │ ❱ 103 │ │ │ ptx_loss = self.ptx_loss_fn(ptx_log_probs.view(-1, ptx_log │ │ 104 │ │ │ actor_loss = ptx_loss * self.ptx_coef + actor_loss * (1 - │ │ 105 │ │ │ │ 106 │ │ self.strategy.backward(actor_loss, self.actor, self.actor_opti │ ╰──────────────────────────────────────────────────────────────────────────────╯ RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. Episode [10/10]: 50%|█████ | 5/10 [02:32<02:32, 30.43s/it]WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1789656 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 1789657) of binary: /home/zzg/miniconda3/envs/py39_DL_cu118/bin/python

Environment

CUDA：11.8 pytorcy：1.13.1 transformers：4.29.0.dev0 system：ubuntu22

Apr 13 '23 04:04 guijuzhejiang

A quick and dirty fix is to modify this line to be

ptx_loss = self.ptx_loss_fn(ptx_log_probs.contiguous().view(-1, ptx_log_probs.size(-1)), label.contiguous().view(-1))

Apr 13 '23 07:04 JThh

@JThh Thank you, it is indeed to modify here, also can use reshape, but reshape seems not efficient

Apr 13 '23 08:04 guijuzhejiang

@JThh In addition, how do you recommend setting these parameters: num_episodes, max_epochs, max_timesteps, update_timesteps.

Apr 13 '23 08:04 guijuzhejiang

Hi, I recommend going with the defaults or adjusting them based on your needs. Due to the limited training capacity, we cannot provide you with the best practice right now!

Apr 17 '23 09:04 JThh

ColossalAI ColossalAI copied to clipboard

[BUG]: PPO errors

🐛 Describe the bug

Environment

ColossalAI
ColossalAI copied to clipboard