muzero-general icon indicating copy to clipboard operation
muzero-general copied to clipboard

Faster calculations in self_play.py

Open bibidybop opened this issue 2 years ago • 0 comments

  1. Improved the UCB calculation: log((a+b+1)/b) + c = log(a+b+1) - log(b) + c = log1p(a+b) + K, where a: parent.visit_count b: pb_c_base (c2 in paper) c: pb_c_init (c1 in paper) K = -log(b) + c = -log(pb_c_base) + pb_c_init This allows for less calculations (log1p, 2 additions compared to what was before: log, 3 additions, 1 division), hence, improving the speed and efficiency.

  2. Updated the selection of chidren in def select_child Originally, UCB values were calculated in 2 passes: once to determine the maximum UCB and 2nd time to generate a list of children with maximum UCB scores. Instead, UCB may be calculated once for each action and saved into a mapping (dictionary), so that later when we search for children with max ucb, we pass through pre-calculated values without the need to calculate ucb once again

  3. Updated the syntax of def add_exploration_noise, when children are accessed directly instead of accessing them via keys (e.g. self.children[a])

  4. Added the function of selecting random element from a list, which utilizes built-in random module (faster than np.random.choice). According to my %timeit observations, using built-ing random.random() requires 1ms for a list of 100_000 elements compared to 24 ms for np.random.choice() using the same list. It may be used when the probabilities of selecting an element are not given (e.g. in UCB or in select_action when temperature == inf)

These all changes are aimed at increasing the speed and efficiency of calculations and allowing for faster MCTS search procedure and generation of self-played games

bibidybop avatar Sep 03 '21 10:09 bibidybop