慢半拍

Results 20 comments of 慢半拍

same error using LLaMA as an actor when zero stage = 0

> same error using LLaMA as an actor when zero stage = 0 When zero stage = 0, removing enable_hybrid_engine option works.

> 1. step3的padding在左侧的原因是,采样的时候,需要批量推断,使预估的next token之前的token不是padding,这也更加合理 2. reward的计算没有问题。预估的reward计算只用到了[prompt,response]中response最后一个token的预估值。训练的时候是这样的: [prompt,response,padding],预估的时候是这样的:[padding,prompt,response],这应该比例预估的时候改成[prompt,padding,response]合理些。

> why set tokenizer.pad_token_id = 0 ? llama model vocabl pad_token="": 3 ,unk_token="": 0. Why not set it to 3 here? I think it should be set to tokenizer.pad_token_id =...

1. The step 1 and step 2 can be run and tested alone. But the step 3 depends on the step 1 and step 2. 2. The chatbot in this...

我跑的时候确实没有遇到这个问题,这里有关于这个问题的一些讨论,可以参考:https://discuss.pytorch.org/t/runtimeerror-element-0-of-variables-does-not-require-grad-and-does-not-have-a-grad-fn/11074/46

> and how to install alpaca-rlhf 1. download this repo 2. Enter ./alpaca_rlhf directory 3. Run the step1, step2 and step3 commands in the Stey by Step section of README

> The following code works well. > > ``` > if (step + 1) % 100 == 0: > reward_score, rejected_scores, acc, score_std = evaluation_reward(rm_model, eval_dataloader) > if args.global_rank ==...

In addition, there are another bug. The rm_model.train() should be put in the step loop: ![图片](https://github.com/l294265421/alpaca-rlhf/assets/9948265/aecb87c7-3af9-4c62-ba73-16787c9126c1)