muzero-general icon indicating copy to clipboard operation
muzero-general copied to clipboard

MuZero

Results 58 muzero-general issues
Sort by recently updated
recently updated
newest added

Hey, I'm wondering if there is any intention to expand the code basis for MuZero unplugged to make it work in an offline RL setting?

enhancement
question

The MCTS implementation here works roughly like this (pseudocode): ```python def mcts(observation): root_predicted_value, stuff = model.initial_inference(observation) root = Node() root.expand(stuff) root.add_exploration_noise() for _ in range(num_simulations): leaf = find_unexpanded_leaf() # here...

enhancement

Sorry I'm a newbie , how would I implement this such that it runs on procgen env? thank you

enhancement
question

Hello, Every 2 player game implemented is turn based. Do you mind providing an example or advising on how to make a game where both players make simultaneous turns? Also,...

question

If I have access to the environment model, is it faster/better to train alphazero instead? thanks

question

Hello, I've been having issues with doing self-play on GPU, and after about a week of experimentation I've realized that it is necessary to use this option if I want...

enhancement

Hi guys, I am having a problem trying to get this to run on a ray cluster, it looks like its working at least but it does keep throwing a...

Here is the error ``` Last test reward: 500.00. Training step: 8449/10000. Played games: 13. Loss: 3.642021-12-31 01:53:03,200 WARNING worker.py:1245 -- A worker died or was killed while executing a...

Four themes to changes - prediction_policy_network output is 2*action space, one mean and standard deviation for each joint. Log_prob is summed after being calculated for each joint - dynamics_encoded_state_network function...

1. Improved the UCB calculation: log((a+b+1)/b) + c = log(a+b+1) - log(b) + c = log1p(a+b) + K, where a: parent.visit_count b: pb_c_base (c2 in paper) c: pb_c_init (c1 in...