PHC Question about the logic of post_physics

Question about the logic of post_physics_step

Open zengweishuai opened this issue 1 year ago • 1 comments

trafficstars

Hi Luo, thanks for your great work! When I run the code, I notice that in the post_physics_step function, the logic of your code is:

update progress buffer
compute the imitation reward
update the observation
record the value of mpjpe, body_pos and body_pos_gt if in evaluation mode So if we call the character's state as s_t for the current timestep and the reference pose as r_t, the computation of imitation reward is based on s_t and r_t, right? Then the new observation, as your paper suggests, should be based on s_t and r_{t+1}. As a consequence, are the values of mpjpe, body_pos and body_pos_gt during evaluation all based on s_t and r_{t+1}?

Sep 21 '24 15:09 zengweishuai

In your example, at time step t, the humanoid would have state s_t, and it needs to compute actions to match r_t.

The reward would be computed based on s_{t+1} and r_t, since you are judged based on the effect of your action.

So, in evaluation mode, we are recording s_{t+1} and r_t.

Oct 08 '24 19:10 ZhengyiLuo