problem on the function of new_cal_re() in fullpace_env.py

Open WAYKEN-TSE opened this issue 2 years ago • 0 comments

i know that the this function is used to calculate the extrinsic reward, but when doing PPO to update the network, the advantage function only include the intrinsic reward(advantages = rollouts.returns[:-1] - rollouts.value_preds[:-1]),then how can the extrinsic reward influence the policy network and what does this function do

Mar 16 '23 03:03 WAYKEN-TSE