rezunli96
rezunli96
A little confused about your implementation of MAA2C. I don't think the input of the actor network is simply the ``joint state" of the agents. According to [1] the critic's...
Hello! I would like to add an example code that demonstrates MCTS can be used as a best response oracle in PSRO. Please let me know if there are any...
Hello there. I am currently working on pathfinding games and found the implementation here really helpful! Some minor questions about the implementation. According to the reference [here](https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf) there maybe several...
Hello. In the current implementation of psro_v2, right before it is going to do best response, it will first select a subset of strategies from the current strategy pool and...
Hi, I have one question about the usages of Dirichlet noises used in AlphaZero that I have been confused for a while. I understand it is used for exploration at...
Hi I recently get some confusion when trying to reproduce your work, particular about experiment (1) on gaussian squeezing. According to my understanding in order to implement MAA2C algorithm as...
Hi, recently I am trying to reproduce your work and feel a little confused when implementing MF-AC. According to the algorithm at somewhere the MF-Value (10) should be calculated, where...