PHC
PHC copied to clipboard
Question about the logic of post_physics_step
Hi Luo, thanks for your great work! When I run the code, I notice that in the post_physics_step function, the logic of your code is:
- update progress buffer
- compute the imitation reward
- update the observation
- record the value of mpjpe, body_pos and body_pos_gt if in evaluation mode So if we call the character's state as s_t for the current timestep and the reference pose as r_t, the computation of imitation reward is based on s_t and r_t, right? Then the new observation, as your paper suggests, should be based on s_t and r_{t+1}. As a consequence, are the values of mpjpe, body_pos and body_pos_gt during evaluation all based on s_t and r_{t+1}?