Jiayi Weng
Jiayi Weng
Thanks for your suggestion. I'll take a look when I'm free, but generally it can be integrated with the training script instead of core.
Here's my proposal: 1. add a test hook after each `test_episode` to report the test reward for optuna pruning system; 2. create optuna script outside the existing training script, for...
Feel free to make a pr if you have preliminary result.
You can use VectorEnv first. BTW, I'm not sure whether pong_ppo.py can work well (typically for hyper-parameter tuning). @Mehooz has other deadlines (2 NeurIPS papers), he will resolve this issue...
Another solution is to create a small number of env (e.g. num = 4), for test: ```python envs = SubprocVectorEnv([env_fns for i in range(num)]) test_collector = Collector(policy, envs) ``` and...
@duburcqa That's great!
> My first question is, which one is better with respect to the ‘Code simplicity’ and the 'learning performance'? It seems that I need to customize the "update" part for...
This is called `Factorized Action Space`. It was first introduced by Dota2 and StarCraft2 project from both openai and deepmind I guess it needs to change a lot of code...
> But what if the reward is delayed? But it breaks Markov property?
> This is on Tianshou 0.4.2, but I believe the problem has been around for a long time. Yep, there is also a systematic issue inside tianshou. What I think...