brax
brax copied to clipboard
Method to train with behavier cloning
Hello brax team,
recently I'm trying to train a humanoid robot to squat. To be honest, it's hard to train with pure ppo algorithm and self-defined reward functions. So, I'm trying to use the bc algorithm from brax.
The question is:
The algorithm needs a teacher policy to train the model online, is there a simple way to generate a teacher policy with self-collected expert data. For example, the corresponding observations and actions trajectory can be add to a train function such as ppo.train() to generate a expert policy.
If not, is there a recommended way to generate a teacher policy that can be accepted by bc.train()
Thank you for your time and consideration.
Tau