bayesian_irl Unexpected estimated rewards

Unexpected estimated rewards

Open kierad opened this issue 1 year ago • 0 comments

Hello, thank you for sharing your code. When I run python src/birl.py, this is the output I get:

sub optimal actions 8/200
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:16<00:00, 310.46it/s]
True rewards:  [0, 0, 0.7, 0.7]
Estimated rewards:  [-0.8338625 -0.549475  -0.5760875 -0.2005125]
expert_q_values
 [[8.866157 8.422768]
 [9.091539 8.422768]
 [9.336886 9.122767]
 [9.570042 9.122767]]
learner_q_values
 [[-9.23189  -9.604087]
 [-8.833096 -9.319699]
 [-8.967461 -9.346312]
 [-8.719601 -8.970737]]
Is a0 optimal action for all states:  True

Looking at the values of True rewards and Estimated rewards in particular: is this the expected behaviour? I would have expected the true and estimated rewards to be very close in value. Maybe I have misinterpreted the output?

Apr 20 '23 19:04 kierad

bayesian_irl bayesian_irl copied to clipboard

Unexpected estimated rewards

bayesian_irl
bayesian_irl copied to clipboard