Axelrod icon indicating copy to clipboard operation
Axelrod copied to clipboard

Greedy and EpsilonGreedy strategies, using Multi-armed Bandits algorithms

Open bing-j opened this issue 1 year ago • 3 comments
trafficstars

Hello! I wrote some strategies that use armed bandit algorithms. Originally, I only wanted to implement the epsilon-greedy strategy, but I now plan on extending this effort and implementing all the algorithms mentioned in the multi-armed bandit chapter of Sutton's Reinforcement Learning: an Introduction (I added the reference to the bibliography). So the branch name is no longer very representative; I added both Greedy and EpsilonGreedy on this branch.

Greedy: Always chooses the action that has the highest average/expected "reward" (score), calculated from its own previous turns. The reward function is updated incrementally and optionally recency weighted, and initial expected rewards of each action default to zero if not modified through a parameter.

EpsilonGreedy: Mostly works like Greedy (with p=1-e), but sometimes acts randomly (with p=e).

These strategies are described in detail in the textbook mentioned above as well.

As I've mentioned on gitter, I was unable to find any strategies that implement these algorithms, although I did find some similar ones. For example, Adaptive() works similarly to Greedy() without weights, but has a hard coded initial sequence, and uses raw sum of scores to choose the optimal play instead of average score. (Side note: the comments in Adaptive().strategy() indicate that it was intended to use the highest average; this may be an error in the code!) If similar strategies already exist, and/or there's any modifications I need to make in the code, please let me know!

Cheers :)

bing-j avatar Mar 17 '24 00:03 bing-j