trl icon indicating copy to clipboard operation
trl copied to clipboard

Does AutoModelForCausalLMWithValueHead get abandoned in PPOv2Trainer ?

Open Sino-Huang opened this issue 1 year ago • 0 comments

System Info

I saw that in the PPOv2 example, the policy model is directly created from AutoModelForCausalLM.from_pretrained I want to know if it is interchangeable with AutoModelForCausalLMWithValueHead.from_pretrained

Also I found that if I use AutoModelForCausalLMWithValueHead in PPOv2 I will have faster ppo training speed compared to using AutoModelForCausalLM . I wonder why this happened.

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder
  • [ ] My own task or dataset (give details below)

Reproduction

change the creation of policy model from calling AutoModelForCausalLM to AutoModelForCausalLMWithValueHead

Expected behavior

I would like to know what causes the difference in performance.

Sino-Huang avatar Oct 06 '24 13:10 Sino-Huang