Joshua Choo Yun Keat
Joshua Choo Yun Keat
Hi @cmarmo, I've submitted a pull request (https://github.com/scikit-learn/scikit-learn/pull/24579) based on @NicolasHug's comments in https://github.com/scikit-learn/scikit-learn/issues/20435#issuecomment-872835169. Could you help me take a look at it? Thanks!
@cmarmo I would like to help with this issue. How should this PR be done given #24764?
Thank you for the comments, @cmarmo and @glemaitre, I will work on a PR incorporating what both of you have suggested.
Hey, sorry for the late reply. We are still working on this project so it won't be complete for another month or so. I saw that you are interested in...
Ok, let me know if you have more questions on getting the Pong2Player code running
I believe this is useful if you want to perform clipping of rewards. You could also do `reward = reward`, it should work as well
Yes, I believe it is within the range of -1 and 1. The values, decided by the rom used, should be as described in the paper "Multiagent Cooperation and Competition...
This is actually an implementation of the Xitari2Player environment. You can see the full list of actions at https://github.com/choo8/Xitari2Player/blob/master/ale_interface.hpp. I only included the 4 relevant actions in the training script.
According to the paper, a game of Pong ends when 21 points is scored by either agent. Epochs are determined by number of iterations, where 250000 iterations would equal to...