OpenRLHF icon indicating copy to clipboard operation
OpenRLHF copied to clipboard

Actor-Critic-Model

Open mgerstgrasser opened this issue 4 months ago • 5 comments

If I understand the current PPO code correctly, this instantiates completely separate actor and critic models, without any layers shared between them. (But correct me in case that is wrong?)

Instead of that, is it possible to just have an additional critic output head on the actor model? (I.e. share all but the last layer between actor and critic, or any number of layers?)

mgerstgrasser avatar Mar 02 '24 22:03 mgerstgrasser