muzero-general icon indicating copy to clipboard operation
muzero-general copied to clipboard

added support for multiple dimension continuous action spaces

Open devin-m-NRL opened this issue 2 years ago • 1 comments

Four themes to changes

  • prediction_policy_network output is 2*action space, one mean and standard deviation for each joint. Log_prob is summed after being calculated for each joint
  • dynamics_encoded_state_network function now takes into account an action array
  • Functions that now need to work for arrays: Np.random.choice, item, and dictionary
  • changes for tensorboard to save video renders

devin-m-NRL avatar Oct 01 '21 19:10 devin-m-NRL

Results: Sawyer shelf environment I added had reward of -43 which is not great but performs okay. It trained with one gpu for 110,000 training steps and 55,000 self play games over 10 days.

image

shelfMuZero3

devin-m-NRL avatar Oct 01 '21 19:10 devin-m-NRL