muzero-general
muzero-general copied to clipboard
MuZero
Hey, I'm wondering if there is any intention to expand the code basis for MuZero unplugged to make it work in an offline RL setting?
The MCTS implementation here works roughly like this (pseudocode): ```python def mcts(observation): root_predicted_value, stuff = model.initial_inference(observation) root = Node() root.expand(stuff) root.add_exploration_noise() for _ in range(num_simulations): leaf = find_unexpanded_leaf() # here...
Sorry I'm a newbie , how would I implement this such that it runs on procgen env? thank you
Hello, Every 2 player game implemented is turn based. Do you mind providing an example or advising on how to make a game where both players make simultaneous turns? Also,...
If I have access to the environment model, is it faster/better to train alphazero instead? thanks
Hello, I've been having issues with doing self-play on GPU, and after about a week of experimentation I've realized that it is necessary to use this option if I want...
Hi guys, I am having a problem trying to get this to run on a ray cluster, it looks like its working at least but it does keep throwing a...
Here is the error ``` Last test reward: 500.00. Training step: 8449/10000. Played games: 13. Loss: 3.642021-12-31 01:53:03,200 WARNING worker.py:1245 -- A worker died or was killed while executing a...
Four themes to changes - prediction_policy_network output is 2*action space, one mean and standard deviation for each joint. Log_prob is summed after being calculated for each joint - dynamics_encoded_state_network function...
1. Improved the UCB calculation: log((a+b+1)/b) + c = log(a+b+1) - log(b) + c = log1p(a+b) + K, where a: parent.visit_count b: pb_c_base (c2 in paper) c: pb_c_init (c1 in...