muzero-general added support for multiple dimension continuous action spaces

added support for multiple dimension continuous action spaces

Open devin-m-NRL opened this issue 2 years ago • 1 comments

Four themes to changes

prediction_policy_network output is 2*action space, one mean and standard deviation for each joint. Log_prob is summed after being calculated for each joint
dynamics_encoded_state_network function now takes into account an action array
Functions that now need to work for arrays: Np.random.choice, item, and dictionary
changes for tensorboard to save video renders

Oct 01 '21 19:10 devin-m-NRL

Results: Sawyer shelf environment I added had reward of -43 which is not great but performs okay. It trained with one gpu for 110,000 training steps and 55,000 self play games over 10 days.

shelfMuZero3

Oct 01 '21 19:10 devin-m-NRL

muzero-general muzero-general copied to clipboard

added support for multiple dimension continuous action spaces

muzero-general
muzero-general copied to clipboard