dyypholdem
dyypholdem copied to clipboard
Poor performance against Slumbot
I run 1800 hands against Slumbot and got the following results:
- Earnings: -15.92 BB/100
- Baseline Earnings: -24.49 BB/100
- Num Hands: 1803
When I checked the weights:
Street | epoch | loss |
---|---|---|
Preflop | 67 | 0.001736 |
Flop | 50 | 0.067477 |
Turn | 50 | 0.072931 |
River | 95 | 0.057868 |
Is this poor performance due to the loss ? how to improve it ? Thanks
I tried to regenerate the data and re-train the model, for python training/main_train.py 4 the training loss doesn't go below 0.1, the validation loss doesn't go below 0.11
I run 1800 hands against Slumbot and got the following results:
- Earnings: -15.92 BB/100
- Baseline Earnings: -24.49 BB/100
- Num Hands: 1803
When I checked the weights:
Street epoch loss Preflop 67 0.001736 Flop 50 0.067477 Turn 50 0.072931 River 95 0.057868 Is this poor performance due to the loss ? how to improve it ? Thanks
1800 hands is meaningless. besides, original deepstack didnt claim to beat slumbot(although their models were much much more accurate).
there are some tweaks that made deepstack beat slumbot by 15bb/100
I tried to regenerate the data and re-train the model, for python training/main_train.py 4 the training loss doesn't go below 0.1, the validation loss doesn't go below 0.11
you need much bigger sample size. default is 100k. models were trained on 1m samples
I tried to regenerate the data and re-train the model, for python training/main_train.py 4 the training loss doesn't go below 0.1, the validation loss doesn't go below 0.11
you need much bigger sample size. default is 100k. models were trained on 1m samples
I already used 1m samples
besides, original deepstack didnt claim to beat slumbot
They claimed to have at least approached the Nash equilibrium, which means it should at least not lose against it. Right ?
(although their models were much much more accurate).
Do you know how much more accurate and how many training samples they used compared to the models in this repo?
there are some tweaks that made deepstack beat slumbot by 15bb/100
Can you elaborate on that please ? Edit: found this link
I tried to regenerate the data and re-train the model, for python training/main_train.py 4 the training loss doesn't go below 0.1, the validation loss doesn't go below 0.11
you need much bigger sample size. default is 100k. models were trained on 1m samples
I already used 1m samples
for me, for example. river network produced errors smaller than 0.1 on validation sets on samples of couple hundred datapoints and below 0.057(than the model in the repository) on 1m sample, without any modification of data generation or training code or even changing any parameters (that said, default data generation is for 10k files, that is 100k samples)
besides, original deepstack didnt claim to beat slumbot
They claimed to have at least approached the Nash equilibrium, which means it should at least not lose against it. Right ?
(although their models were much much more accurate).
Do you know how much more accurate and how many training samples they used compared to the models in this repo?
there are some tweaks that made deepstack beat slumbot by 15bb/100
Can you elaborate on that please ? Edit: found this link
not necessarily, nash equilibrium means that neither player can improve his expectation (winnings or losses!) by changing only his own strategy. but yeah in theory in headsup nl hold'em nash equilibrium guarantees that you won't lose if the opponent plays perfectly and win every time he deviates from that play. but in practice neither player plays perfect strategy, so you hold 2 assumptions: about your own perfect strategy and about your opponent's strategy(which in practice may be very different from your assumption because he has a different model with different errors!).
for example, in practice, you may give very little weight to certain nodes in your strategy tree(and hence huge error margins!) because of your belief in how your opponent should play(ie they shouldn't happen very often or at all) but your opponent's strategy may be very different from what you assume and they would happen quite a lot and then you would lose a lot
yes i mentioned this article in particular, and also noam brown comments in his other papers on deepstack's "alternative" approach, see his papers on nested/safe search
if you're interested in this project please contact me @yayegor on telegram
this exact algorithm without any changes vs slumbot, although i re-trained river/turn and flop models down to ~0.031 huber loss (that took a lot of calculations). obviously, the sample size of ~3k is meaningless, probably the only conclusion is that this implementation is probably not getting crushed by a slumbot (and probably not crushing it either!), so it's probably robust
i went through the handhistories and as a professional poker player with 12+yrs experience i can tell both bots showed a very solid game
this exact algorithm without any changes vs slumbot, although i re-trained river/turn and flop models down to ~0.031 huber loss (that took a lot of calculations). obviously, the sample size of ~3k is meaningless, probably the only conclusion is that this implementation is probably not getting crushed by a slumbot (and probably not crushing it either!), so it's probably robust
i went through the handhistories and as a professional poker player with 12+yrs experience i can tell both bots showed a very solid game
Hi yegorrr,
Did you also re-trained this model changing the algorithm based on Supremus Bot? I think is not so difficult to change. I will give it a try... If you wanna help, tell me. I'll appreciate.
this exact algorithm without any changes vs slumbot, although i re-trained river/turn and flop models down to ~0.031 huber loss (that took a lot of calculations). obviously, the sample size of ~3k is meaningless, probably the only conclusion is that this implementation is probably not getting crushed by a slumbot (and probably not crushing it either!), so it's probably robust i went through the handhistories and as a professional poker player with 12+yrs experience i can tell both bots showed a very solid game
Hi yegorrr,
Did you also re-trained this model changing the algorithm based on Supremus Bot? I think is not so difficult to change. I will give it a try... If you wanna help, tell me. I'll appreciate.
Does this match support 6 players?
besides, original deepstack didnt claim to beat slumbot
They claimed to have at least approached the Nash equilibrium, which means it should at least not lose against it. Right ?
(although their models were much much more accurate).
Do you know how much more accurate and how many training samples they used compared to the models in this repo?
there are some tweaks that made deepstack beat slumbot by 15bb/100
Can you elaborate on that please ? Edit: found this link
Does this match support 6 players