Reinforcement-learning-with-tensorflow issues

Save and Reuse of DDPG model

1

Hello Dr. MorvanZhou, Thanks for sharing this great repository. May I ask you 2 simple (stupid, as I am a rookie of RL~lol) questions: 1. How can I save the...

lyjge

REINFORCE中对discounted reward的centralize的依据是什么？

``` def _discount_and_norm_rewards(self): # discount episode rewards discounted_ep_rs = np.zeros_like(self.ep_rs) running_add = 0 for t in reversed(range(0, len(self.ep_rs))): running_add = running_add * self.gamma + self.ep_rs[t] discounted_ep_rs[t] = running_add # normalize...

ZefanW

关于PPO具体使用

您好，我正在使用simply_PPO训练我的机器人做步态行走训练，其中，S_DIM是18，A_DIM 是12，请问需要对simply_PPO做哪些改变才能更好地适应我的训练？

janyChan

DPPO not converging

I tried your DPPO algorithm with EP_MAX = 8000 and the total moving reward is not converging. Any Idea why ?

ghost

关于tkinter的问题

请问如果我想保存最后跑出来的轨迹成为一张图片，请问有什么办法可以实现的吗，因为如果跑大型的项目要一直等结果收敛去看轨迹会很麻烦

MorchelPeng

Actor Critic neural combine

Hello, I have question . If I want combine actor and critic neural, how could calculate them loss? nerual like this: ``` with tf.variable_scope('AC'): w_initializer = tf.random_normal_initializer(0.0, 0.01) l1 =...

RozenAstrayChen

关于策略梯度和PPO中目标函数的几个问题。

莫烦您好，请教您下面几个问题： 1. DPPO中关于PPO的伪代码 ![image](https://user-images.githubusercontent.com/26730243/48056374-9994d700-e1ec-11e8-917d-f8e92a64fcb1.png) 这一部分是计算从t=1到T新旧策略ratio的累加值。但是您代码中的实现是求得tf.reduce_mean，这应该是和这个目标函数相匹配： ![image](https://user-images.githubusercontent.com/26730243/48056510-eaa4cb00-e1ec-11e8-98f9-14289c307371.png) 我很困惑这两种目标函数到底哪个是正确的？或者说都正确，那么有什么区别？ 2. 关于PG和PPO的目标函数。这个问题和上个问题有点类似。下图是传统PG的目标函数： ![image](https://user-images.githubusercontent.com/26730243/48056687-496a4480-e1ed-11e8-9efa-df8b8ec0ef06.png) 这是对轨迹求期望，所以计算t=1到T的累加值。但是PPO的目标函数如下： ![image](https://user-images.githubusercontent.com/26730243/48056906-baa9f780-e1ed-11e8-8b34-21159b35cf5d.png) 这个是对action-state pair来求期望，我不太理解怎么从对轨迹求期望变换到对action-state pair求期望。 3. 关于PG本身的目标函数好像都有这两种写法： ![image](https://user-images.githubusercontent.com/26730243/48057461-10cb6a80-e1ef-11e8-968a-0255c0e74847.png) 第一个绝对是对了，第二个我就不知道怎么理解了？希望得到莫烦老师的帮助！

BrainWWW

A3C example fail after updating TF==1.6

4

Hi @MorvanZhou , the BipedalWalker A3C example fail to converge after updating the TensorFlow. It would be great if we can fix it. https://github.com/MorvanZhou/Reinforcement-learning-with-tensorflow/blob/master/experiments/Solve_BipedalWalker/A3C.py ``` W_3 Ep: 7983 | -------...

zsdonghao

关于open AI gym运行报错

2

我下载了DQN的代码，发现运行报错，主要错误在两个地方上 1.choose_action(self, observation): observation = observation[np.newaxis, :]出现错误为TypeError: tuple indices must be integers or slices, not tuple 2.在修改了第一部分的错误之后（通过课程讨论区一位朋友提供的想法先observation=np.array(observation)然后再进行reshape，发现在transition部分又出现错误，提示输入的transition和self.memory列数不同，在进行observation的打印之后发现它的格式很奇怪，是(array([ 0.00107828, -0.02266533, -0.03175206, -0.04841794], dtype=float32), {}) 这样的一个形式，和observation_的形式也不同，我又回去check了maze的observation和observation_的形式发现maze的是相同，不知道该如何修改RL_brain的代码，希望大神们能给点建议

Jackmeory

pandas==1.4.4 FutureWarning解决：关于'df.append' use 'pandas.concat' instead.

2

pandas==1.4.4 FutureWarning解决 ## 警告内容 ```python Reinforcement-learning-with-tensorflow\contents\2_Q_Learning_maze\RL_brain.py:45: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. self.q_table = self.q_table.append( ``` ##...

TysonSir

Reinforcement-learning-with-tensorflow
Reinforcement-learning-with-tensorflow copied to clipboard

Metadata

Save and Reuse of DDPG model

REINFORCE中对discounted reward的centralize的依据是什么？

关于PPO具体使用

DPPO not converging

关于tkinter的问题

Actor Critic neural combine

关于策略梯度和PPO中目标函数的几个问题。

A3C example fail after updating TF==1.6

关于open AI gym运行报错

pandas==1.4.4 FutureWarning解决：关于'df.append' use 'pandas.concat' instead.

← Metadata

Owner

Metadata

Reinforcement-learning-with-tensorflow Reinforcement-learning-with-tensorflow copied to clipboard

Metadata

← Metadata

Owner

Metadata

Reinforcement-learning-with-tensorflow
Reinforcement-learning-with-tensorflow copied to clipboard