rezunli96 issues

Results 7 issues of


                                            rezunli96

The Actor Critic Structure in MAA2C

A little confused about your implementation of MAA2C. I don't think the input of the actor network is simply the ``joint state" of the agents. According to [1] the critic's...

PSRO-MCTS

Hello! I would like to add an example code that demonstrates MCTS can be used as a best response oracle in PSRO. Please let me know if there are any...

Minor discussions about pathfinding games

Hello there. I am currently working on pathfinding games and found the implementation here really helpful! Some minor questions about the implementation. According to the reference [here](https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf) there maybe several...

non-marginal strategy selectors in psro_v2

Hello. In the current implementation of psro_v2, right before it is going to do best response, it will first select a subset of strategies from the current strategy pool and...

In AlphaZero, are the Dirichlet noises applied once per search call, or once every simulation steps within a search call?

Hi, I have one question about the usages of Dirichlet noises used in AlphaZero that I have been confused for a while. I understand it is used for exploration at...

How to implement MAAC/MFAC for Gaussian Squeezing?

Hi I recently get some confusion when trying to reproduce your work, particular about experiment (1) on gaussian squeezing. According to my understanding in order to implement MAA2C algorithm as...

How to calculate MF-Value (eq(10)) in MF-AC/MF-Q

Hi, recently I am trying to reproduce your work and feel a little confused when implementing MF-AC. According to the algorithm at somewhere the MF-Value (10) should be calculated, where...