gym-locm
gym-locm copied to clipboard
Cannot fully reproduce the Coac vs Chad winrate as reported in CEC2020 using NativeAgent
The Coac vs Chad winrate was reported to be 57% in CEC 2020, but I obtained a winrate ~= 80% using locm-runner
and NativeAgent
.
The evaluation code is:
locm-runner \
--p1-path "/path/to/Strategy-Card-Game-AI-Competition/contest-2020-07-CEC/Coac/main" \
--p2-path "/path/to/Strategy-Card-Game-AI-Competition/contest-2020-07-CEC/Chad/agent/target/release/agent" \
--games 100
where I had commented out the cerr <<
code for Coac (e.g., here and other similar lines) as I found the self._process.read_nonblocking
code from NativeAgent
seemed to read both stdout and stderr (a known issue).
And here are the printed results:
...
2022-05-30 22:43:51.392527 Episode 97: 79.38% 20.62%
2022-05-30 22:43:57.315334 Episode 98: 78.57% 21.43%
2022-05-30 22:44:03.639829 Episode 99: 78.79% 21.21%
2022-05-30 22:44:12.195598 Episode 100: 79.00% 21.00%
79.00% 21.00%
See also the original discussion here
I tried to run the consistency checks with Coac vs. Chad matches in the original Java engine, but it only works if both agents are deterministic. Sadly, Chad is not deterministic (MCTS has a random component), and I couldn't find an easy way to set a seed to its RNG (I don't know Rust :P).
Running 200 games of Coac vs. Chad using the competition's run.sh script, Coac achieved a win rate of 68%, which is significantly higher than the reported 57%. This may be due to differences in hardware (?) from my computer to those used in the competition since Marasbot from earlier editions also achieved different win rates on my computer. However, while using locm-runner
to execute the same matches, Coac ended up with a win rate of 78%, as reported by OP, which means that there may also be something wrong with the NativeAgent
class and/or the engine (although the engine seems to be correct, considering the other consistency checks I've run).
For now, I'll let this issue hanging. I'll come back if I think of other ideas to debug this match-up.