PokerRL
PokerRL copied to clipboard
Observationspace/Infostate
Hey Eric,
thanks for making this public, haven't found a good env so far that implements Multiplayer NL. Am I understanding the code right that the observationspace isn't actually perfect information? E.g. it is only the last couple actions? Do you have any research on how this affects convergence? I had a bit of trouble understanding the code, so I apologize if I just didn't read it right.
Heyo,
You're welcome! I've written a wrapper that tracks the action history for limit games. For no-limit games, this piece is slightly less obvious, so I would default to the recurrent option that tracks the history of public observations, which is also sound. So, perfect recall is still supported for NL through recurrent NNs but you did correctly spot that action history is only tracked through that and not explicitly.
If you wish to track it explicitly, you could just track it manually in the training code or write an appropriate "Wrapper" for the NL environment. However, this will require you to adjust the NN observation space accordingly. TL;DR: Recurrent is cleaner and more scalable.
Cheers, Eric