open_spiel icon indicating copy to clipboard operation
open_spiel copied to clipboard

open_spiel/python/examples/deep_cfr_pytorch.py fails to converge

Open lawrencewlcknight opened this issue 3 months ago • 15 comments

The example given in open_spiel/python/examples/deep_cfr_pytorch.py fails to converge to the known average policy value of 1/18. It looks like this was tested at one point and converged as expected.

lawrencewlcknight avatar Oct 05 '25 22:10 lawrencewlcknight

Thanks for letting us know.

This is likely due to multiple upgrades to pytorch and we haven't gone back to check. Can you try with different hyper-parameters?

If we can't get it to work again, we'll need to remove it as it is unfortunately unmaintained.

lanctot avatar Oct 09 '25 12:10 lanctot

will check if the op hasn't got time

alexunderch avatar Oct 13 '25 09:10 alexunderch

hallo

AronSchiffl avatar Oct 15 '25 09:10 AronSchiffl

@lanctot the torch implementation tends not to converge (even though I basically tweaked basic hyperparameters: number of iterations, number of games played and the network size): the resulting exploitability is 0.9 ± 0.1.

I tried to match the results with jax implementation, but it outputs an interesting warning:

W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence

So, it's sort of difficult with reference deep cfr implementations, right now.

P.s. used python3.12, maybe that's why the warnings appear, but still.

alexunderch avatar Oct 16 '25 10:10 alexunderch

@lawrencewlcknight, did you make any attempts to fix that, or decided to leave the issue at it is?

alexunderch avatar Nov 11 '25 10:11 alexunderch

Thanks for the effort @alexunderch.

This was working once before, but several years ago. My guess is it's something that changed in upgrading of PyTorch (there was a major upgrade and the code wasn't changed as a result).

Unfortunately we don't have the resources to ensure the continued correctness of the pytorch algorithm implementations because we don't actively use them. So we have to rely on community help for this.

Unless someone finds a reason, I will mark it as unmaintained and probably remove it in the next version.

lanctot avatar Nov 17 '25 19:11 lanctot

I'll add the "contribution welcome" label and wait a few weeks to see if there are any pytorch experts wants to take a deeper look.

lanctot avatar Nov 17 '25 19:11 lanctot

there is also a need to tweak/upgrade the jax implementation as well, cause of haiku deprecation and strange tf datasets warnings, if what

alexunderch avatar Nov 17 '25 19:11 alexunderch

there is also a need to tweak/upgrade the jax implementation as well, cause of haiku deprecation and strange tf datasets warnings, if what

That's another set of things we'll have to deprecate unless we get a volunteer to change them all to flax/nnx/linen. Is that "easy", do you think? Don't go off and try just yet.. just curious how hard/easy you think it'd be?

I'm not sure we should keep doing this every few years unless it's code that gets wide use.. and Deep CFR feels like it's exaclty the kind of thing on the borderline (I'd be happy to scrap the PyTorch version unless someone fixes it. A JAX version feels nice to have but not essential).

Maybe we should start with the easier one that will surely get wide use: JAX DQN. And I bet this exercise could help figure out any of the current issues with the AlphaZero port.

lanctot avatar Nov 17 '25 19:11 lanctot

It is not definitely hard not hard to do because it is mostly refactoring. However, to make the things work, we need a plan or roadmap just due to sheer number of things to upgrade.

I think we can outline the first things to work on and rewrite flax/linen and pytorch versions in a united manner. I am a big fan of the systematic approach.

alexunderch avatar Nov 17 '25 20:11 alexunderch

Hi @lanctot and @alexunderch, I’d be happy to take a look at this. I’m new to this codepath, so I may need a bit of time to reproduce the issue first — hope that’s okay.

@alexunderch thanks for testing the hyperparameters, that’s super helpful — I’ll continue from there.

(And longer term, I’m also interested in the Haiku→Flax refactor mentioned above and would be happy to explore that as I get more familiar with the code.)

fuyuan-li avatar Dec 02 '25 04:12 fuyuan-li

hi, I forgot about the issue – go on

alexunderch avatar Dec 02 '25 05:12 alexunderch

Thank @fuyuan-li .. I had mentally marked it for deletion. If you can find the problem, I'd be happy to keep it.

lanctot avatar Dec 02 '25 11:12 lanctot

Hi @lanctot and @alexunderch, want to give a quick update before I submit a PR (and hear your thoughts) Straight answer: Changed a few hyperparameters to make the model converge — but these were not random tweaks. Each change targeted a specific issue observed during debugging: (a) change reinitialize_advantage_networks=False (b) increase policy_network_train_steps (c) increase num_traversal

Before assuming it’s “just hyperparameters,” I ran a few small diagnostics to confirm the implementation itself is fine:

  1. Advantage network: can overfit a single sample perfectly → infra works
  2. Single info_state batches: network reaches the theoretical minimum MSE given regret variance → training is fine
  3. Policy network: when trained separately, loss decreases smoothly → component also fine

So the glitch seems a mixture of: a) high variance sampled regrets given a tiny game (not a lot info_states) give hard time for advantage network (I noticed ~5k regrets from 1 same info_state and see their stds are [0.6781, 0.9935], per axis, ranging -3 to 3). Plus, b) Policy network needs more iterations to converge (currently by default trained 1 times).

I also ran a 40 simulations to test policy values: -0.058 (SD=0.029) close to the theoretical –0.056 (–1/18). I know variance is still noticeable, but the mean passes a t-test. (I set random seed so this is reproducible)

Let me know if above looks right. If so, I will submit my PR.

Btw, also notice Jax and tf2 versions not working. Not sure if they're expected to be maintained.if you think it’s worth fixing, I’m happy to take a look (or help on other higher-priority issues if any).

Thanks!

reproducible_experiment.html

fuyuan-li avatar Dec 10 '25 13:12 fuyuan-li

@fuyuan-li wow this is even more than I could ask for! Amazing work, and thanks for the thorough job.

Thanks for the PR; this is certainly the right track. My guess is the values chosen for those settings were probably optimized for a different game (maybe Leduc?) but it's good to have the code in a working state for Kuhn poker.

TF2 version: we have deprecated support for TF, so we have removed this. No need to look into that.

Jax version: we should probably fix this too. I would hope that it's easy enough to port these changes over? Wdyt? --> But we can move that discussion to the PR.

Again, thanks so much, you saved this code from getting deleted !

lanctot avatar Dec 11 '25 21:12 lanctot