Does AutoModelForCausalLMWithValueHead get abandoned in PPOv2Trainer ?

Open Sino-Huang opened this issue 1 year ago • 0 comments

System Info

I saw that in the PPOv2 example, the policy model is directly created from AutoModelForCausalLM.from_pretrained I want to know if it is interchangeable with AutoModelForCausalLMWithValueHead.from_pretrained

Also I found that if I use AutoModelForCausalLMWithValueHead in PPOv2 I will have faster ppo training speed compared to using AutoModelForCausalLM . I wonder why this happened.

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder
[ ] My own task or dataset (give details below)

Reproduction

change the creation of policy model from calling AutoModelForCausalLM to AutoModelForCausalLMWithValueHead

Expected behavior

I would like to know what causes the difference in performance.

Oct 06 '24 13:10 Sino-Huang