ELF
ELF copied to clipboard
Updated ELF still returning exceeded memory error
Using two-gtp: https://www.mankier.com/1/gogui-twogtp
System Spec: 150GB RAM TESLA V100 GPU
After 2.5 games, using the following settings:
./gtp.sh ~/v1.bin --gpu 0 --num_block 20 --dim 224 --mcts_puct 1.5 --batchsize 2 --mcts_rollout_per_batch 2 --mcts_threads 2 --mcts_rollout_per_thread 250 --resign_thres 0.00 --mcts_virtual_loss 1
From log:
[2018-10-17 17:46:47.497] [elf::ai::tree_search::MCTSAI_T-22] [info] [-1] MCTSAI Result: BestA: [B9][bi][191], MaxScore: 3, Info: -2.97157/3 (-0.990524), Pr: 0.0101511, child node: 21109 Action: 191 MCTS: 1239.9ms. Total: 1239.9ms. B<< B<< = B9 B<< B<< W>> play B B9 W<< W<< = W<< W<< W>> genmove w slurmstepd: error: Job [ommitted] exceeded memory limit (153936196 > 153600000), being killed slurmstepd: error: Exceeded job memory limit slurmstepd: error: *** JOB [ommitted] ON [ommitted] CANCELLED AT 2018-10-18T04:46:48
Could this be a two-gtp side error? as i'm not sure how to self play without two-gtp using just the ELF system for competitive play (not self-play training).
#100 #94
Using the command you provided, I observe that memory usage reaches an asymptote of roughly 2.5 GB.
Not sure whether twogtp would cause any issues. cc @qucheng
I found a work around by running games individually so that it resets memory usage, might confirm later if it was two-gtp related if I get some spare time.
For those that are interested this is the command that I ran to work around it (you can adjust number of games etc depending on how much memory is being used up):
#!/bin/bash
BLACK="player_b.sh"
WHITE="player_w.sh"
for i in {1..50}
do ./gogui-twogtp -black "$BLACK" -white "$WHITE" -games 1 \
-size 19 -sgffile game_filename_$i -auto -verbose -debugtocomment -komi 7.5
done
twogtp will just be 2 copies of ELF, which will consume ~5G. Might exceed max mem on some hardware.
For experimental reasons, would it be possible to direct me to where the code is to remove the entire subtree created by the AI after each move (not just the unused portion)?
@downseq remove --persistent_tree
would clean up the existing tree before each move. See here: https://github.com/pytorch/ELF/blob/master/src_cpp/elf/ai/tree_search/mcts.h#L142
@downseq remove
--persistent_tree
would clean up the existing tree before each move. See here: https://github.com/pytorch/ELF/blob/master/src_cpp/elf/ai/tree_search/mcts.h#L142
Thanks for the confirmation.
So it seems keeping the subtree after each move is not set as default as it was in the AlphaGo Zero paper?
@downseq It is always helpful. If the memory allows, this should always help in boosting the performance with zero additional cost. So why not?
I think there is still some confusion as to whether it is on by default or not, but it seems like it is left on as default, so maybe I have misinterpreted your comment earlier.