rezunli96 comments

Results 8 comments of


                                            rezunli96

Building and Running the Code

I encountered the same problem. Have you solved it?

PSRO-MCTS

I also added an example using IS-MCTS as BR in the latest PR. It was implemented similarly as in PSRO-MCTS.

BestResponsePolicy requires concrete policy probability values, fails with JAX abstract tracer values

Is it because there is a jitted function that calls BestResponsePolicy? If it is then I think it is normally not solvable because the execution flow of a jitted JAX...

Simplest 2 player perfect information game?

Hi, pathfinding seems to be a good one. You can actually specify `kExampleMultiAgentGrid` to customize the grid you want. This game is general-sum, e.g., see [this paper](https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf)

How to calculate MF-Value (eq(10)) in MF-AC/MF-Q

It just occurred to me that the sampled trajectory is an unbiased estimator of the MF-Value? It works for REINFORCE-like AC. But still confused how to calculated for off-policy RL...

Call for New Games

Hi @thorsten-j do you know which version of Mahjong Gravon has? The one I am currently looking into is Riichi Mahjong.

Call for New Games

@thorsten-j Sounds great! Then I think we do not really have a conflict here and we can work on our own versions of Majong separately. Maybe there is a unified...

non-marginal strategy selectors in psro_v2

Hi sorry for the late reply and I just noticed -- seems like the external version still has this issue