Yutong

Results 2 issues of Yutong

Thanks for the nice work. I want to ask for some help about the quick start. First, could you please give some instruction of running single rl in gym-based env,...

### 🐛 Describe the bug hello, here is a bug, similar with [issues-3534](https://github.com/hpcaitech/ColossalAI/issues/3534) use default Anthropic/hh-rlhf dataset, pretrain_model: bigscience/bloom-1b1 batch_size: 1 max_epochs: 1 max_len: 512 loss_fn: log_sig loss is random...

bug