慢半拍 comments

Results 20 comments of


                                            慢半拍

When running Stage-3 scripts with enable_hybrid_engine encountered errors

same error using LLaMA as an actor when zero stage = 0

When running Stage-3 scripts with enable_hybrid_engine encountered errors

> same error using LLaMA as an actor when zero stage = 0 When zero stage = 0, removing enable_hybrid_engine option works.

can't run llama-2-7b-hf even though I'm using use_auth_token

the same problem

step2和step3中padding side似乎不一样？

> 1. step3的padding在左侧的原因是，采样的时候，需要批量推断，使预估的next token之前的token不是padding，这也更加合理 2. reward的计算没有问题。预估的reward计算只用到了[prompt,response]中response最后一个token的预估值。训练的时候是这样的: [prompt,response,padding]，预估的时候是这样的：[padding,prompt,response]，这应该比例预估的时候改成[prompt,padding,response]合理些。

A question about setting tokens

> why set tokenizer.pad_token_id = 0 ？ llama model vocabl pad_token="": 3 ，unk_token="": 0. Why not set it to 3 here? I think it should be set to tokenizer.pad_token_id =...

Steps

1. The step 1 and step 2 can be run and tested alone. But the step 3 depends on the step 1 and step 2. 2. The chatbot in this...

element 0 of tensors does not require grad and does not have a grad_fn

我跑的时候确实没有遇到这个问题，这里有关于这个问题的一些讨论，可以参考：https://discuss.pytorch.org/t/runtimeerror-element-0-of-variables-does-not-require-grad-and-does-not-have-a-grad-fn/11074/46

how to run it, need more details

> and how to install alpaca-rlhf 1. download this repo 2. Enter ./alpaca_rlhf directory 3. Run the step1, step2 and step3 commands in the Stey by Step section of README

stop at step2 evaluation_reward

> The following code works well. > > ``` > if (step + 1) % 100 == 0: > reward_score, rejected_scores, acc, score_std = evaluation_reward(rm_model, eval_dataloader) > if args.global_rank ==...

stop at step2 evaluation_reward

In addition, there are another bug. The rm_model.train() should be put in the step loop: ![图片](https://github.com/l294265421/alpaca-rlhf/assets/9948265/aecb87c7-3af9-4c62-ba73-16787c9126c1)