OpenRLHF
OpenRLHF copied to clipboard
Actor-Critic-Model
If I understand the current PPO code correctly, this instantiates completely separate actor and critic models, without any layers shared between them. (But correct me in case that is wrong?)
Instead of that, is it possible to just have an additional critic output head on the actor model? (I.e. share all but the last layer between actor and critic, or any number of layers?)