TexasSolver icon indicating copy to clipboard operation
TexasSolver copied to clipboard

Inquiry for possible performance improvement

Open EddieMataEwy opened this issue 1 year ago • 5 comments

The performance of this repo is already amazing, but I wanted to ask a question. Have you checked the family of improvements defined in this paper? (https://realworld-sdm.github.io/paper/27.pdf) It derives existing algorithms like CFR+ or DCFR by computing "instant updates" to the counterfactual value, the regret and the strategy. I don't know if this would add a lot of complexity to the existing codebase, but it allows, for example, for even faster convergence. This would make CFR+ converge faster than DCFR without worrying about tuning alpha, beta and gamma.

EddieMataEwy avatar Feb 25 '23 22:02 EddieMataEwy

No I havn't read the paper, will read it. Sounds promising

bupticybee avatar Feb 27 '23 01:02 bupticybee

I don't understand the step(5), where to use the instant counterfactual value updated by σt+1?

xuzy1975 avatar Mar 02 '23 18:03 xuzy1975

   //ICFR要在这用新策略更新payoffs
      const vector<float> current_strategy_new = trainable->getcurrentStrategy();
      fill(payoffs.begin(),payoffs.end(),0);
      //收集数据
      for (int action_id = 0; action_id < actions.size(); action_id++) {
          vector<float>& action_utilities = results[action_id];
          if(action_utilities.empty())
              continue;
          for (int hand_id = 0; hand_id < action_utilities.size(); hand_id++) {
                  float strategy_prob = current_strategy_new[hand_id + action_id * node_player_private_cards.size()];
                  payoffs[hand_id] += strategy_prob * (action_utilities)[hand_id];
          }
      }

add to the end of actionUtility() , it indeed improve performance in some public, such as 6h6c6d, 7d7h2h...

xuzy1975 avatar Mar 03 '23 07:03 xuzy1975

I believe you need calculation 5 to proceed with parent node calculations. I don't understand it very well. That is why I opened an issue instead of coding it myself and doing a pull request.

EddieMataEwy avatar Mar 04 '23 19:03 EddieMataEwy

It seems to need to recalculate payoff use the new strategy, I tried, in some case like banchmark settings, it convergent faster, but in large scale game ,it works worse, maybe somewhere I misunderstood.

xuzy1975 avatar Mar 05 '23 16:03 xuzy1975