rlcard icon indicating copy to clipboard operation
rlcard copied to clipboard

Use trained model .pth in No-Limit Holdem

Open DanielusG opened this issue 3 years ago • 14 comments

Hi, I trained a DQN model in NO-LIMT and the training session returned me a .PTH file, how can I use it to play against it?

P.S. Thank you very much for having made this powerful library available :)

DanielusG avatar Sep 27 '21 16:09 DanielusG

@DanielusG Thanks for the feedback. You can use the evaluating script in https://github.com/datamllab/rlcard/blob/master/examples/evaluate.py to specify the game and pre-trained weights to let the agents play against each other

daochenzha avatar Sep 28 '21 05:09 daochenzha

Hello I am using this library and it is really good. Thanks for the hard work.

I trained 6 DQN models on NLH with 1_000_000 episodes, evaluating every 10_000 with 5_000 games. The best agent's model was saved as a .pth file. The whole process took around 16 hours with a RTX 3060. Does this training parameters seem right to you?

Now I would like to use this model by inputting the scenario data manually and running some sort of "predict()" function on the agent and get the action taken by the agent as the output/return.

How can I do this?

alexx-ftw avatar Sep 04 '22 21:09 alexx-ftw

@alexx-ftw Thanks for the feedback. The hyperparameters seem good, but since NLH is a very hard game (especially when there are many players), it may still not perform as well as humans. To make a single prediction, you may refer to https://github.com/datamllab/rlcard/blob/558c3f71375a22e61a58e242a029864116430727/rlcard/envs/env.py#L144 This essentially takes a state as input and make predictions.

If you would like to improve NLH. You may need to do feature engineering by modifying https://github.com/datamllab/rlcard/blob/master/rlcard/envs/nolimitholdem.py Also, you may try training with multiple GPUs using DMC.

If you want to explore other games and achieve human-like performance, you could try simpler games like Blackjack Leduc Hold’em. You could also try Dou Dizhu, which we have alread spent many efforts in feature engineering.

daochenzha avatar Sep 04 '22 22:09 daochenzha

@daochenzha Thanks for the quick reply. I came to the same "eval_step" function by going backwards from the training example into the functions and classes.

I printed the state variable and it looks like a dict. I believe it is a numpy array.

{
  "legal_actions": {
    "0": null,
    "1": null,
    "3": null,
    "4": null
  },
  "obs": "[0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n 0. 0. 0. 0. 1. 2.]",
  "raw_obs": {
    "hand": [
      "D9",
      "S4"
    ],
    "public_cards": [],
    "all_chips": [
      1,
      2
    ],
    "my_chips": 1,
    "legal_actions": [
      "Action.FOLD",
      "Action.CHECK_CALL",
      "Action.RAISE_POT",
      "Action.ALL_IN"
    ],
    "stakes": [
      99,
      98
    ],
    "current_player": 0,
    "pot": "3",
    "stage": "Stage.PREFLOP"
  },
  "raw_legal_actions": [
    "Action.FOLD",
    "Action.CHECK_CALL",
    "Action.RAISE_POT",
    "Action.ALL_IN"
  ],
  "action_record": [
    [
      0,
      "Action.FOLD"
    ]
  ]
}

Do all those keys need to be filled with proper information for the agent to take an action?

By the way, how is the "obs" key structured? Looks like it would be the cards on the deck but I am not sure as it has 54 parameters instead of 52.

alexx-ftw avatar Sep 04 '22 23:09 alexx-ftw

If you would like to improve NLH. You may need to do feature engineering by modifying https://github.com/datamllab/rlcard/blob/master/rlcard/envs/nolimitholdem.py Also, you may try training with multiple GPUs using DMC.

If you want to explore other games and achieve human-like performance, you could try simpler games like Blackjack Leduc Hold’em. You could also try Dou Dizhu, which we have alread spent many efforts in feature engineering.

Could you briefly comment on the advantages of DMC vs DQN or even NFSP for NHL Poker?

alexx-ftw avatar Sep 04 '22 23:09 alexx-ftw

By the way, how is the "obs" key structured? Looks like it would be the cards on the deck but I am not sure as it has 54 instead of 52.

Okay I found the "_extract_state()" function inside the "nolimitholdem.py" env child of Env, which encondes the state of the game. Will keep digging.

Edit 1: I wasn't too far off. The first 52 (from 0 to 51 index) correspond to the cards that have appeared on the game face up (the player's Hand + the Community Cards). Index 52 is the number of player's chips at risk, and 53 is the biggest number of chips being bet by any player.

Why use the max() function instead of sum() for adding them up tho?

alexx-ftw avatar Sep 05 '22 00:09 alexx-ftw

@alexx-ftw It is just heuristic. sum() could be better, or the combination of sum() and max() could be even better. The current one is just an example of how to encode them. To further improve the agent, more efforts are needed to try different encodings.

daochenzha avatar Sep 05 '22 04:09 daochenzha

@alexx-ftw did you ever get this working? how well does your model play? I am running a social poker plattform and I'm looking for an AI that I can use to run my bots. It does not need to be very good, just enough so it is fun to play against it.

DerKorb avatar Sep 29 '22 13:09 DerKorb

@alexx-ftw did you ever get this working? how well does your model play? I am running a social poker plattform and I'm looking for an AI that I can use to run my bots. It does not need to be very good, just enough so it is fun to play against it.

This project is just the base for you to build on top of. This won't get you a Super-Human poker AI by itself and you still need to code a lot of stuff to get it to do what you want. Extremely useful from my pov as otherwise you would need to code the game logic yourself.

alexx-ftw avatar Sep 29 '22 13:09 alexx-ftw

I don't quite understand, I thought with the help of this project you could train a network that basically would take a table-state as input and give you a move it considers best, am I wrong with that assumption? I'm not looking for super-human. Quite the contrary, I am looking for a mediocre AI so the players on my site would have a solid chance on beating the bots.

DerKorb avatar Sep 29 '22 13:09 DerKorb

I don't quite understand, I thought with the help of this project you could train a network that basically would take a table-state as input and give you a move it considers best, am I wrong with that assumption? I'm not looking for super-human. Quite the contrary, I am looking for a mediocre AI so the players on my site would have a solid chance on beating the bots.

Yes this project can do what you want out of the box.

alexx-ftw avatar Sep 29 '22 13:09 alexx-ftw

Nice! I do not want to spam this issue so I contacted you on er*****@gmail, I hopy you do not mind.

DerKorb avatar Sep 29 '22 14:09 DerKorb