PHC
PHC copied to clipboard
Question about the logic of post_physics_step
trafficstars
Hi Luo, thanks for your great work! When I run the code, I notice that in the post_physics_step function, the logic of your code is:
- update progress buffer
- compute the imitation reward
- update the observation
- record the value of mpjpe, body_pos and body_pos_gt if in evaluation mode So if we call the character's state as s_t for the current timestep and the reference pose as r_t, the computation of imitation reward is based on s_t and r_t, right? Then the new observation, as your paper suggests, should be based on s_t and r_{t+1}. As a consequence, are the values of mpjpe, body_pos and body_pos_gt during evaluation all based on s_t and r_{t+1}?
In your example, at time step t, the humanoid would have state s_t, and it needs to compute actions to match r_t.
The reward would be computed based on s_{t+1} and r_t, since you are judged based on the effect of your action.
So, in evaluation mode, we are recording s_{t+1} and r_t.