memory-maze icon indicating copy to clipboard operation
memory-maze copied to clipboard

PPO Baseline for MemoryMaze

Open subho406 opened this issue 1 year ago • 11 comments

Hi, Great environment. Just wondering, is there a PPO baseline available for this environment?

subho406 avatar May 30 '23 20:05 subho406

use. https://github.com/Stable-Baselines-Team/stable-baselines3-contrib. RecurrentPPO aka PPO LSTM). ??

zdx3578 avatar Jun 09 '23 12:06 zdx3578

Thanks, I was looking if there are known/tuned hyperparameters available for this environment. I already have an implementation atari cnn + lstm (https://github.com/subho406/Recurrent-PPO-Jax) based of CleanRL implementation . I tried it with default atari hyperparameters but doesnt seem to be learning on this environment.

subho406 avatar Jun 09 '23 14:06 subho406

https://github.com/NM512/dreamerv3-torch/issues/18

zdx3578 avatar Jun 15 '23 01:06 zdx3578

@subho406 We have tried running PPO baseline, but it was pretty much flatlining at 0.

jurgisp avatar Jun 16 '23 07:06 jurgisp

@jurgisp what about this greate cell? https://github.com/NeuromorphicComputing/STPN

zdx3578 avatar Jun 18 '23 02:06 zdx3578

I tried synchronous PPO on this problem but it did not seem to work very well, it seems to saturate at a score of around 6-7. But I was able to get close to the IMPALA baseline in Memory Maze 9x9 using the Asynchronous PPO implementation from Sample Factory (https://www.samplefactory.dev/). I used the default hyper-parameters mentioned in their DMLab experiments (https://www.samplefactory.dev/09-environment-integrations/dmlab/), and change sequence length to 100 and number of sequences to 32. It seems to work pretty well! I am getting a reward of around 20ish after 100 million steps.

subho406 avatar Jun 21 '23 21:06 subho406

@jurgisp in paper benchmark, dreamer v2 run maze code config will open ? @subho406 like dreamerv3-torchconfig batch_length = 100 batch_size = 32 ?

zdx3578 avatar Jun 23 '23 03:06 zdx3578

@subho406 that's very interesting that you got reasonable results with Asynchronous PPO. Would you be able to share the results? Did you try it on all 4 sizes of memory maze, or just on 9x9?

@zdx3578 can you clarify your question?

jurgisp avatar Jun 23 '23 10:06 jurgisp

@jurgisp in memroy_maze paper benchmark, experiment like dreamerv2 run maze , vae+gru run maze will open source share other people to run or improve base it ?

zdx3578 avatar Jun 25 '23 01:06 zdx3578

VAE+GRU can change experiment by VAE+STPN.

zdx3578 avatar Jun 26 '23 13:06 zdx3578

@subho406 that's very interesting that you got reasonable results with Asynchronous PPO. Would you be able to share the results? Did you try it on all 4 sizes of memory maze, or just on 9x9?

@zdx3578 can you clarify your question?

@jurgisp Yes, I am working on the results for a paper. Happy to share them when they are ready!

subho406 avatar Jul 11 '23 20:07 subho406