cfrx icon indicating copy to clipboard operation
cfrx copied to clipboard

Performance is lacking

Open vrnvorona opened this issue 6 months ago • 1 comments

When checking your repo I see benchmarks, but running Leduc CFR example on both CPU and GPU doesn't give anything even close to this performance. On 9950x I get like 30 it/s.

Please provide benchmarking code for cfrx side if example is not full picture (for example doesn't use vmap extensively idk).

MCCFR gives 190 it/s on GPU, 9.4k it/s on CPU

The fact that it works better on CPU is a huge sign of problem :(

GPU is 5090 btw.

vrnvorona avatar Jul 06 '25 19:07 vrnvorona

Please provide benchmarking code for cfrx side if example is not full picture (for example doesn't use vmap extensively idk).

I confirm the code used for benchmarking is not different than the one in the MCCFR trainer, only that I don't count exploitability measures (which is normal), in particular, no additional vmap were used.

The fact that it works better on CPU is a huge sign of problem :(

GPU is 5090 btw.

No it's typically not, the reason it works better on CPU is because pgx envs are slower on GPU when they are not heavily parallelized. For benchmarking I used the vanilla version of MCCFR that uses a single copy of the environment. Were you to write a version leveraging env parallelization, you would most likely see speedups.

When checking your repo I see benchmarks, but running Leduc CFR example on both CPU and GPU doesn't give anything even close to this performance. On 9950x I get like 30 it/s.

MCCFR gives 190 it/s on GPU, 9.4k it/s on CPU

That's a fair point, I had a hard time reproducing the result today since I didn't touch the code for a while. I figured out it's a JAX problem, see https://github.com/jax-ml/jax/discussions/24501. Rolling back to jax==0.4.31 or using the flag XLA_FLAGS=--xla_cpu_use_thunk_runtime=false gives the reported speed for me. Please tell me if that works for you.

Egiob avatar Jul 09 '25 13:07 Egiob