rl-mpc-locomotion
rl-mpc-locomotion copied to clipboard
Observation mismatch
The observation in training is mismatched with deployment. The base_pos should be removed.
I was trying to align the RL reward to MPC cost, but it turns out it's better to go without position tracking for both stages.