stable-baselines LSTM policies are broken for PPO1 and TRPO

See the feature/fix_lstm branch for a test which fails for the above mentioned algorithms.

For PPO1 and TRPO the cause seems to be that the batch size is not provided to the policy (None is passed). Then the ortho-initializer has issues.

For PPO2 the assert in line 109 fails:

assert self.n_envs % self.nminibatches == 0, "For recurrent policies, "\
                        "the number of environments run in parallel should be a multiple of nminibatches."

Since the PPO2 instance is created using PPO2(policy, 'CartPole-v1'), the default parameters of PPO2 seem to be broken somehow?

For ACKTR the issue is somewhere in get_factors of kfac.py and I have no clue what that does and what goes wrong there but it complains about some shared nodes among different computation ops.

Dec 20 '18 15:12 ernestum

Related to #60 #70

Dec 20 '18 15:12 araffin

Any hints for what exactly the masks are used for? This would help a lot!

Dec 20 '18 17:12 ernestum

I did not look closely at the code, but I remember seeing masks and dones interchangeably in the code (see here fir ppo2). Then it used in the policies and especially by lstm to reset states for new episodes.

Dec 20 '18 18:12 araffin

PPO2 works if we fix the issue with the number of minibatches.

Dec 20 '18 19:12 ernestum

ACKTR is fixed on the enhancements branch, however I lost quite some time this month, and didn't have time to finish the branch (wanted to fix PPO1 and TRPO first). might try again over the next few weeks

Dec 21 '18 09:12 hill-a

I am also interested in the combination of PPO1 + CnnLstmPolicy :)

Dec 25 '18 17:12 RGring

Any updates on this ? TRPO still doesn't support MlpLstmPolicy :'(

Feb 20 '19 23:02 HareshKarnan

@HareshMiriyala for now, we don't have time to fix that (even though it is on the roadmap). Currently, we are working on fixing GAIL + A2C, this will be merged with master soon.

However, we appreciate contributions, especially to fix that kind of thing ;)

Mar 06 '19 09:03 araffin

I also mentioned this problem last year in issue #60. It seem the problem is not fixed up to now, is there any plan ? I would like to fix the problem in this month, is there any comment or advice? @araffin @hill-a @erniejunior

Dec 13 '19 13:12 hejujie

It seem the problem is not fixed up to now, is there any plan ?

Hello, There is not plan for now, we are now focusing on the migration to tf2 (cf #366 ).

I would like to fix the problem in this month, is there any comment or advice?

We would appreciate a PR, however, I think only @erniejunior is a bit familiar with the LSTM policy (I don't know much of that part of the code, which comes from OpenAI code and which is one of the most complex and undocumented part of the lib).

Dec 15 '19 15:12 araffin