LSTM parameters lack clarity

Open mpgussert opened this issue 5 years ago • 0 comments

Greetings all! I think this is a documentation issue but the tldr is that it is extremely unclear what the parameters of a custom LSTM policy correspond to.

For example, lets say I want to train an LSTM agent to preform some task. The rollout for this task is 256 time steps, but I want the agent to only observe a sequence of 16 time steps at a time. n_steps is a parameter in both the algorithm (PPO2 for example) and the agent, but there's a hidden restriction on this value based on the number of environments (I understand why this is but I had to dig for it). There's a reuse parameter that is described as "reuse – (bool) If the policy is reusable or not" but no description of what it actually does. Is the policy stateful? not stateful? is it many to one? one to one? etc...

I understand that if needed, you can completely specify a custom policy manually, but if a "shorthand" method is provided, I would personally prefer to use that, and I can't because it's unclear how.

Jan 28 '20 18:01 mpgussert