bayesian_irl
bayesian_irl copied to clipboard
Unexpected estimated rewards
Hello, thank you for sharing your code. When I run python src/birl.py
, this is the output I get:
sub optimal actions 8/200
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:16<00:00, 310.46it/s]
True rewards: [0, 0, 0.7, 0.7]
Estimated rewards: [-0.8338625 -0.549475 -0.5760875 -0.2005125]
expert_q_values
[[8.866157 8.422768]
[9.091539 8.422768]
[9.336886 9.122767]
[9.570042 9.122767]]
learner_q_values
[[-9.23189 -9.604087]
[-8.833096 -9.319699]
[-8.967461 -9.346312]
[-8.719601 -8.970737]]
Is a0 optimal action for all states: True
Looking at the values of True rewards
and Estimated rewards
in particular: is this the expected behaviour? I would have expected the true and estimated rewards to be very close in value. Maybe I have misinterpreted the output?