DRL-FlappyBird Why do the program only use two state?

Why do the program only use two state?

Open guotong1988 opened this issue 7 years ago • 4 comments

I read from here. Why do the program only use the current state and the next state? Why only using the two state can work? Thank you @songrotek

Mar 07 '17 08:03 guotong1988

反过来想，为什么不只用1个state呢，而用了2个state

Apr 12 '17 11:04 guotong1988

关键这两个state是紧挨着的，就是说第二个state有情况，是前若干步决定的啊

Apr 12 '17 11:04 guotong1988

执行前的画面, 执行的动作, reward, 执行后的画面, terminal. 这5个元素组成一个训练集. http://blog.csdn.net/songrotek/article/details/50580904 这个里面写了这个这个算法的要素, 我也不是很清楚. 可以一起探讨下

Sep 07 '17 00:09 saselovejulie

@guotong1988 我看代码是这样的, 每次执行操作获得一帧画面. currentState = [画面1, 画面2, 画面3, 画面4] newState = np.append(self.currentState[:,:,1:],nextObservation,axis = 2) 执行完的newState = [画面2, 画面3, 画面4, 画面5]

Sep 07 '17 02:09 saselovejulie

DRL-FlappyBird DRL-FlappyBird copied to clipboard

Why do the program only use two state?

DRL-FlappyBird
DRL-FlappyBird copied to clipboard