HandyRL icon indicating copy to clipboard operation
HandyRL copied to clipboard

(To be discussed) (Idea) feature: multi dimensional reward

Open YuriCat opened this issue 4 years ago • 2 comments

Do we delete OUTCOME, or use OUTCOME as the first dimension of REWARD if it is defined?

YuriCat avatar Nov 18 '21 15:11 YuriCat

My first impression was that this change could be closer to the implementation of general reinforcement learning. Also, it can be a simple code 👍 However, I think a user must pay attention to the first dimension of multi-reward is the outcome. How can we set gamma in games with no outcome? like gamma: [1, 0.99]?

ikki407 avatar Nov 22 '21 03:11 ikki407

Yes, gamm: [1, 0.99] will work. Warnings for over length need to be added.

YuriCat avatar Nov 22 '21 05:11 YuriCat