TexasSolver
TexasSolver copied to clipboard
Inquiry for possible performance improvement
The performance of this repo is already amazing, but I wanted to ask a question. Have you checked the family of improvements defined in this paper? (https://realworld-sdm.github.io/paper/27.pdf) It derives existing algorithms like CFR+ or DCFR by computing "instant updates" to the counterfactual value, the regret and the strategy. I don't know if this would add a lot of complexity to the existing codebase, but it allows, for example, for even faster convergence. This would make CFR+ converge faster than DCFR without worrying about tuning alpha, beta and gamma.
No I havn't read the paper, will read it. Sounds promising
I don't understand the step(5), where to use the instant counterfactual value updated by σt+1?
//ICFR要在这用新策略更新payoffs
const vector<float> current_strategy_new = trainable->getcurrentStrategy();
fill(payoffs.begin(),payoffs.end(),0);
//收集数据
for (int action_id = 0; action_id < actions.size(); action_id++) {
vector<float>& action_utilities = results[action_id];
if(action_utilities.empty())
continue;
for (int hand_id = 0; hand_id < action_utilities.size(); hand_id++) {
float strategy_prob = current_strategy_new[hand_id + action_id * node_player_private_cards.size()];
payoffs[hand_id] += strategy_prob * (action_utilities)[hand_id];
}
}
add to the end of actionUtility() , it indeed improve performance in some public, such as 6h6c6d, 7d7h2h...
I believe you need calculation 5 to proceed with parent node calculations. I don't understand it very well. That is why I opened an issue instead of coding it myself and doing a pull request.
It seems to need to recalculate payoff use the new strategy, I tried, in some case like banchmark settings, it convergent faster, but in large scale game ,it works worse, maybe somewhere I misunderstood.