Jiayi Weng

Results 303 comments of Jiayi Weng

Thanks for your suggestion. I'll take a look when I'm free, but generally it can be integrated with the training script instead of core.

Here's my proposal: 1. add a test hook after each `test_episode` to report the test reward for optuna pruning system; 2. create optuna script outside the existing training script, for...

Feel free to make a pr if you have preliminary result.

You can use VectorEnv first. BTW, I'm not sure whether pong_ppo.py can work well (typically for hyper-parameter tuning). @Mehooz has other deadlines (2 NeurIPS papers), he will resolve this issue...

Another solution is to create a small number of env (e.g. num = 4), for test: ```python envs = SubprocVectorEnv([env_fns for i in range(num)]) test_collector = Collector(policy, envs) ``` and...

@duburcqa That's great!

> My first question is, which one is better with respect to the ‘Code simplicity’ and the 'learning performance'? It seems that I need to customize the "update" part for...

This is called `Factorized Action Space`. It was first introduced by Dota2 and StarCraft2 project from both openai and deepmind I guess it needs to change a lot of code...

> But what if the reward is delayed? But it breaks Markov property?

> This is on Tianshou 0.4.2, but I believe the problem has been around for a long time. Yep, there is also a systematic issue inside tianshou. What I think...