RLHF issues

How to fix the following errors?

1

The following error occurred while running cell 10 in **6. Tune language model using PPO with our preference model**. After adding `__init__.py` to `/content/trlx/examples/summarize_rlhf/reward_model/`, I still get the same error....

missflash

Create LICENSE

I would like to modify the code in this repository and use it as lecture material.

seele1917

REWARD_CHECKPOINT_PATH ( How I solve following issue?)

Where did I find this 'REWARD_CHECKPOINT_PATH' as a bin file? ` 3 rw_tokenizer.pad_token = rw_tokenizer.eos_token 4 rw_model = GPTRewardModel(SFT_MODEL_PATH) ----> 5 rw_model.load_state_dict(REWARD_CHECKPOINT_PATH) 6 rw_model.half() 7 rw_model.eval() `

nadimkaysar

GPTRewardModel class

Why are the rewards truncated in the "GPTRewardModel" class? What is the reason and where can I find more information about it? # Retrieve first index where trajectories diverge divergence_ind...

israelpf

RLHF
RLHF copied to clipboard

Metadata

How to fix the following errors?

Create LICENSE

REWARD_CHECKPOINT_PATH ( How I solve following issue?)

GPTRewardModel class

← Metadata

Owner

Metadata

RLHF RLHF copied to clipboard

Metadata

How to fix the following errors?

Create LICENSE

REWARD_CHECKPOINT_PATH ( How I solve following issue?)

GPTRewardModel class

← Metadata

Owner

Metadata

RLHF
RLHF copied to clipboard