CrazyAra
CrazyAra copied to clipboard
High memory usage (original: Tree doesn't grow beyond 6 million Nodes)
Hi Johannes, First of all, congratulations on your latest release! This engine plays remarkably different from any others I have seen, and it seems to do so at a very high level. I noticed an issue in infinite analysis. Once the engine hit somewhere between 5 and 6 million nodes it slowed down quite dramatically and didn't grow much after 6.3 million. Is this issue due to RAM limit?
My PC:
- i7-4790k
- NVIDIA RTX 2070 Super
- 8 GB DDR3 RAM.
Thank you for your wonderful contribution to chess! Let me know how I can reach you out for a potential collaboration. Tanmay Srinath.
Hello @magicianofriga , nice to hear that you like this project. Yes, it is possible allocate more than 6 million nodes and you are right with the assumption that you run out of memory on your machine.
I just tested it on my desktop machine using the ClassicAra 0.9.5 executable. About 1 GiB is allocated on startup, when loading the CUDA, cuDNN and TensorRT library. The remaining memory is allocated dynamically overt time.
$ ./CrazyAra_ClassicAra_MultiAra_0.9.5_Linux_TensorRT/ClassicAra
isready
position startpos
go infinite
...
info depth 41 seldepth 61 multipv 1 score cp 31 nodes 17000024 nps 14257 tbhits 0 time 1192437 pv d2d4 d7d5 c2c4 c7c6 b1c3 g8f6 c4d5 c6d5 g1f3 b8c6 c1f4 a7a6 e2e3 c8g4 h2h3 g4f3 d1f3 e7e6 f1d3 c6b4 d3b1 a8c8 e1g1 f8d6 f4g5 h7h6 g5h4 e8g8 f3e2 d6e7 f2f4 b4c6 g2g4 f6e8 h4g3 e8d6 f4f5 e7h4 g3h2 f8e8 f5e6
info string rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
bestmove d2d4
info string apply move to tree
root
# | Move | Visits | Policy | Q-values | CP | Type
-----+-------+--------------+-----------+------------+-------+------------
001 | d4 | 15916915 | 0.1981373 | 0.0998159 | 31 | UNSOLVED
000 | e4 | 531758 | 0.6484740 | 0.0529837 | 16 | UNSOLVED
002 | Nf3 | 106669 | 0.0669805 | 0.0752128 | 23 | UNSOLVED
003 | c4 | 51625 | 0.0401248 | 0.0676878 | 20 | UNSOLVED
010 | c3 | 26822 | 0.0014303 | -0.0103028 | -3 | UNSOLVED
019 | f3 | 26821 | 0.0000890 | -0.0903614 | -28 | UNSOLVED
008 | e3 | 26730 | 0.0041160 | 0.0041643 | 1 | UNSOLVED
009 | d3 | 26723 | 0.0017874 | -0.0385385 | -11 | UNSOLVED
005 | g3 | 26697 | 0.0102718 | 0.0302087 | 9 | UNSOLVED
007 | b3 | 26654 | 0.0052031 | -0.0252461 | -7 | UNSOLVED
013 | h4 | 26618 | 0.0006316 | -0.0421854 | -12 | UNSOLVED
016 | Na3 | 26590 | 0.0003413 | -0.0799976 | -24 | UNSOLVED
006 | Nc3 | 26544 | 0.0073106 | 0.0086331 | 2 | UNSOLVED
018 | h3 | 26524 | 0.0000947 | -0.0251692 | -7 | UNSOLVED
017 | Nh3 | 26500 | 0.0002794 | -0.0697391 | -21 | UNSOLVED
011 | b4 | 26486 | 0.0011651 | -0.0404012 | -12 | UNSOLVED
004 | f4 | 26422 | 0.0116823 | -0.0581965 | -17 | UNSOLVED
014 | g4 | 26377 | 0.0005855 | -0.1250571 | -39 | UNSOLVED
012 | a3 | 26321 | 0.0008925 | -0.0210689 | -6 | UNSOLVED
015 | a4 | 26233 | 0.0004028 | -0.0606963 | -18 | UNSOLVED
-----+-------+--------------+-----------+------------+-------+------------
initial value: 0.0941720
nodeType: UNSOLVED
isTerminal: 0
isTablebase: 0
unsolvedNodes: 20
Visits: 17032028
freeVisits: 32004/17032028
In this case, it allocated 17.1 GiB for 17 million nodes. That is a lot honestly.
Previously, we had the problem that one was only able to allocate up to 16.7 million nodes because of reaching a variable overflow (https://github.com/QueensGambit/CrazyAra/issues/39). This has been fixed by changing the data type from float
to uint32_t
for the visits variable.
Now, you are in principle able to run 4.2 billion (2^32 = 4,294,967,295 ) simulations.
Reducing the memory consumption, is on the TODO list for future versions.
In the meantime you can use the UCI option Nodes_Limit
to limit the number of nodes you wish to allocate when using the go infinite
command or during engine tournament play.
Let me know how I can reach you out for a potential collaboration.
I'm not sure if you are interested in collaborating via coding. If so, you can follow the build instructions in the wiki pages.
If you have a linux system, you can use the update.sh
shell script which is the script that was used to install ClassicAra on TCEC.
Additionally, if you want to setup all dependencies to start reinforcement learning, you can make use of the dockerfile.
If you want, you can start working on the memory issue, e.g. one approach would be to replace the vectors in the NodeData
class by a struct of singular values.
https://github.com/QueensGambit/CrazyAra/blob/0f3d60f48fa914209664d74a9eba329c6fc4b54c/engine/src/nodedata.h#L90
However, I'm afraid that working on this problem is not ideal for newcomers to the project, as it requires a lot of refactoring and a good understanding of the code base.
You can also write me a mail using the mail adress given on my profile page.
Hi Johannes, Thanks for the prompt response! With regards to coding, I am average at best. I was looking at contributing positions where Classic Ara seems to struggle compared to other top engines in the world i.e. add my chess expertise to the project. Let me know if that will be of use to you.
I will take a serious look at helping with reinforcement learning, though right now my focus is only on the Classic Ara part of your project as that holds maximum relevance to what I am doing (high level correspondence chess).
Best wishes, Tanmay Srinath.
Sure, creating a new issue which summarizes problematic positions for ClassicAra can help fixing search problems. You can follow a similar structure as in:
- https://github.com/LeelaChessZero/lc0/issues/164
One common way to analyze a chess engine strength and weaknesses is to use test suites such as the Eigenmann Rapid Engine Test (ERET-Test).
- https://www.chessprogramming.org/Test-Positions
- https://www.chessprogramming.org/Eigenmann_Rapid_Engine_Test
However, test suites cannot fully replace traditional Elo testing.
Hi Johannes! I would like to start training nets for ClassicAra. Can you point me to resources that would allow me to set up a training pipeline? Thanks!
Hello again @magicianofriga ! One way to start is to use the setup with the nvidia docker container.
- https://github.com/QueensGambit/CrazyAra/tree/master/DeepCrazyhouse/src/training#start-training-from-a-docker-container
However, this requires using linux.
Alternatively, you may install the dependencies from the requirements.txt file.
- https://github.com/QueensGambit/CrazyAra/blob/master/DeepCrazyhouse/src/training/requirements.txt
If you want to use supervised learning, you need to create a data set from pgn files first.
- https://github.com/QueensGambit/CrazyAra/tree/master/DeepCrazyhouse/src/preprocessing/download_pgns
- https://github.com/QueensGambit/CrazyAra/blob/master/DeepCrazyhouse/src/preprocessing/convert_pgn_to_planes.ipynb
The configuration files can be found here:
- https://github.com/QueensGambit/CrazyAra/tree/master/DeepCrazyhouse/configs
You need to rename main_config_template.py
into main_config.py
for it to work.
An exemplary data set for crazyhouse can be found downloaded here:
- https://github.com/QueensGambit/CrazyAra/wiki/Stockfish-10:-Crazyhouse-Self-Play
The current training is done in MXNet. However, a next step of this project is to add pytorch training support. If you are familiar with pytroch and like coding you can start working on setting up a pytorch training loop. Next, you can add a PR if you like.
The class TrainerAgent
could be converted into an abstract class and inherited by a new TrainerAgentPytorch
class:
- https://github.com/QueensGambit/CrazyAra/blob/master/DeepCrazyhouse/src/training/trainer_agent.py
The current reporistory can be used as reference:
- https://gitlab.com/jweil/PommerLearn/-/tree/master/pommerlearn
ONNX is used as the main network format right now to allow a flexible exchange between different DL-frameworks. You can use Netron to investigate the neural network architecture:
- https://github.com/lutzroeder/netron
Thanks for the response! Will WSL (Windows Subsystem for Linux) work instead?
Sadly no, but I may be wrong about this.
Hi Johannes, Would it be fine if I shared the PGN for training with you? Since I don't have Linux at the moment I am not sure how I can contribute to the training process. What would be the email ID to which I share? Thanks, Tanmay Srinath.
Hello Tanmay Srinath, you can use my email as shown on https://www.aiml.informatik.tu-darmstadt.de/people/jczech/ I'm currently also in the process of adding Pytorch as a new neural network framework back-end in order to start a RL run for classical chess.
Thanks! If it's a PyTorch trainer, perhaps I can use it on Windows as well. Also, I would love to write a script that automates compilation on Windows. How do you compile your releases on Windows? Can you share your existing script with me so that I can try to generalise it? Tanmay Srinath.
Hi, you can find the compilation script for linux of release 1.0.0 here:
- https://github.com/QueensGambit/CrazyAra/releases/download/1.0.0/update.sh
I added instructions for building CrazyAra on Windows in the wiki:
- https://github.com/QueensGambit/CrazyAra/wiki/3.-Build-CrazyAra-binary
Currently, I'm building the binaries for Windows manually according to the wiki-pages. This usually involves updating the Cuda libraries as well.