TextRL icon indicating copy to clipboard operation
TextRL copied to clipboard

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

Results 4 TextRL issues
Sort by recently updated
recently updated
newest added

I follow the example: https://voidful.dev/jupyter/2021/07/25/textrl-elon-musk.html I wonder why batchsize is larger than update_ Interval, so I modify as follows: **before:** `agent = actor.agent_ppo(update_interval=10, minibatch_size=2000, epochs=20)` **after:** `agent = actor.agent_ppo(update_interval=100, minibatch_size=10,...

Nice repo!!! it seems that the default parameter for the policy will freeze all the layers of the language model we are using and just update the lm_head I tried...

Nice repo!! I completed the training using code examples and now make predictions on the test set. But I found that using ```actor. predict``` to obtain the output of the...

Tried to create notebook in examples folder for token classification problem. Please help me develop this.