DHP icon indicating copy to clipboard operation
DHP copied to clipboard

On 'v' action rewards

Open aletd opened this issue 4 years ago • 1 comments

Hello dear mr. Yuhang Song,

In the paper, it is mentioned that the rewards for action v are given by rewv And the parameters θv are optimized according to the rule: grad

In the code, https://github.com/YuhangSong/DHP/blob/73ddec2b837f0379cc5d0e008cd9dc422d832c3b/envs.py#L488-L502 there seems to be no reward for v calculated, instead v_lable is estimated as a "weighted" target value (sum of subject_i_v * similarity), https://github.com/YuhangSong/DHP/blob/73ddec2b837f0379cc5d0e008cd9dc422d832c3b/suppor_lib.py#L154-L159 which then contributes another term (v-v_lable)^2 in the loss function:

https://github.com/YuhangSong/DHP/blob/73ddec2b837f0379cc5d0e008cd9dc422d832c3b/a3c.py#L238-L239

Is there any particular reason why the direct sum of rewards is not calculated, and instead the above approach is considered?

aletd avatar Feb 28 '20 12:02 aletd

Bump!

aletd avatar Mar 12 '20 15:03 aletd