Tensorflow-DeepMind-Atari-Deep-Q-Learner-2Player
Tensorflow-DeepMind-Atari-Deep-Q-Learner-2Player copied to clipboard
Pong2Player environment usage
Hey, sorry for the late reply. We are still working on this project so it won't be complete for another month or so. I saw that you are interested in an API for 2 player pong game. You can actually look at the API of Xitari2Player, which I ported over into Python at https://github.com/choo8/Xitari2Player. The API calls there should be able to let you give actions to 2 players in Pong.
Hi, thanks for your response. I tried to run main_2.py
in training branch , and if you edit code in 'history.py', its okay to run it.
just change else
sentence in get
function to np.transpose(self.history, (1,2,0))
Ok, let me know if you have more questions on getting the Pong2Player code running
thks, I am trying to rewrite a version based on your work, because I am not familiar with the use of ale etc. And I can't understand your definition of reward
in agent.py
def observe(self, screen, reward, action, terminal): reward = max(self.min_reward, min(self.max_reward, reward))
Do you mind explaining this to me. In my view, reward = reward
is okay
I believe this is useful if you want to perform clipping of rewards. You could also do reward = reward
, it should work as well
The reward is gotten by ale.ale_getRewardA()
or ale.ale_getRewardB()
.
Can you tell me the range of reward?
And I have read
Multiagent cooperation and competition with deep reinforcement learning
In this essay the reward is between -1 and 1. Thks a lot.
Yes, I believe it is within the range of -1 and 1. The values, decided by the rom used, should be as described in the paper "Multiagent Cooperation and Competition with Deep Reinforcement Learning".
sorry to bother you again. why the action is among [0,1,3,4] for agent 1 and [20,21,23,24] for agent2.
I know there are four possible actions [None, fire, up, down].
Is this defined in roms/Pong2Player025.bin
? Thks.
This is actually an implementation of the Xitari2Player environment. You can see the full list of actions at https://github.com/choo8/Xitari2Player/blob/master/ale_interface.hpp. I only included the 4 relevant actions in the training script.
Thks. And where can I find the definition of ale.ale_isGameOver
. I found it's hard to achieve a state where ale.ale_isGameOver
is true.
And by my observation, is it right that one side first got 20 points means Gameover
of one epoch?
According to the paper, a game of Pong ends when 21 points is scored by either agent. Epochs are determined by number of iterations, where 250000 iterations would equal to one epoch. This hyperparameter are also the ones used in the original paper.
Thks a lot. I think there might be some mistakes in your code about action.
The range is between [20,21,23,24], take agent2 as example, but the output of network is [0,1,2,3], so it needs mapping from [0,1,2,3] to [20,21,23,24]. Take [0,1,2,3] as index of [20,21,23,24] is okay.
And the exploration rate changes with agent.step, but at the beginning of a new epoch, the agent.step
starts from 0, but the exploration rate shouldn't begin from ep_start
anymore, it should continue changing with the value of last step in the last epoch.