brax icon indicating copy to clipboard operation
brax copied to clipboard

Method to train with behavier cloning

Open Excurrybang opened this issue 6 months ago • 0 comments

Hello brax team,

recently I'm trying to train a humanoid robot to squat. To be honest, it's hard to train with pure ppo algorithm and self-defined reward functions. So, I'm trying to use the bc algorithm from brax.

The question is:

The algorithm needs a teacher policy to train the model online, is there a simple way to generate a teacher policy with self-collected expert data. For example, the corresponding observations and actions trajectory can be add to a train function such as ppo.train() to generate a expert policy.

If not, is there a recommended way to generate a teacher policy that can be accepted by bc.train()

Thank you for your time and consideration.

Tau

Excurrybang avatar May 21 '25 15:05 Excurrybang