Reinforcement-learning-with-tensorflow issues

ValueError: invalid literal for int() with base 10: 'None' when run 'env.render()'

I encountered this problem when using gym： ValueError: invalid literal for int() with base 10: 'None' It appears at 'env.render()' My python version is 2.7 Could anyone tell me what...

xingyueye

PPO and Reward

Hello Zhou: I get confused about how does the Reward work to guide the PPO to train the ANNs? 1、For example,I feed a batch_size data to the ANNs,then I will...

yangtianyong

How can solve the problem of action == Nan in PPO？

2

niu0717

代码下载下来后训练不收敛是什么问题呢

众所周知，RL训练及其不稳定，相信morvan在训练的时候也有很多小技巧，可以share一下么？还有，我下载DDPG代码训练后达不到视频中的效果，是什么原因呢

niniuba123456

bug_issue: A3C环境交互step() 后返回的done 被下面一行判断覆盖了.

https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/97dba9bafce7fb5203d395ba77a770fad80931b3/contents/10_A3C/A3C_continuous_action.py#L131 130行环境返回回来的done(游戏是否结束). 被131行 (该episode是否到达最后一步)强行覆盖了. 也就是说,环境里面游戏结束, 这一轮episode也不会结束.

hyc6668378

States in the Environment.

1

Hi MorvanZhou, I have a question about the state size. I read the comments that you have added. However, I could not properly understand the state space size and how...

Kalpan13

这里的回报 r 具体指什么？如何根据自己的问题修改代码以获得回报r？

1

`env = gym.make('Pendulum-v0').unwrapped ppo = PPO() all_ep_r = [] for ep in range(EP_MAX): s = env.reset() buffer_s, buffer_a, buffer_r = [], [], [] ep_r = 0 for t in range(EP_LEN):...

liudading

请问morvan有如何写simulator也就是环境environment的教程吗？

1

DaDaDoDoLee

question about tf.GraphKeys

" with tf.variable_scope('eval_net'): # c_names(collections_names) are the collections to store variables c_names, n_l1, w_initializer, b_initializer = \ ['eval_net_params', tf.GraphKeys.GLOBAL_VARIABLES], 10, \ tf.random_normal_initializer(0., 0.3), tf.constant_initializer(0.1) # config of layers # first...

WillysMa

Reinforcement-learning-with-tensorflow
Reinforcement-learning-with-tensorflow copied to clipboard

Metadata

ValueError: invalid literal for int() with base 10: 'None' when run 'env.render()'

PPO and Reward

如何画奖励与训练回合的关系图？

How can solve the problem of action == Nan in PPO？

代码下载下来后训练不收敛是什么问题呢

bug_issue: A3C环境交互step() 后返回的done 被下面一行判断覆盖了.

States in the Environment.

这里的回报 r 具体指什么？如何根据自己的问题修改代码以获得回报r？

请问morvan有如何写simulator也就是环境environment的教程吗？

question about tf.GraphKeys

← Metadata

Owner

Metadata

Reinforcement-learning-with-tensorflow Reinforcement-learning-with-tensorflow copied to clipboard

Metadata

← Metadata

Owner

Metadata

Reinforcement-learning-with-tensorflow
Reinforcement-learning-with-tensorflow copied to clipboard