Jiayi Weng

Results 303 comments of Jiayi Weng

Close due to no response, feel free to re-open this issue.

For DQN-family, it uses epsilon-greedy to add some noise to discrete action. For some other policies, it directly adds random noise generated by a distribution to the existing continuous action....

https://github.com/thu-ml/tianshou/blob/277138ca5b050518aacaaea367192f910fbe666d/test/discrete/test_dqn.py#L118-L127

still no reply :(

Have you read https://github.com/thu-ml/tianshou/pull/147#issuecomment-660956151? Tuple obs space is not supported because of our design choice. It will lead to many undefined behaviors in `Batch`, and further slow down the entire...

> I think the allowing the Critic to have structured obs (Dict) is still a good thing though no? Yes, but we cannot assume which key the users want. Thus...

I wrote something here: https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#user-defined-environment-and-different-state-representation

> An multi-agent venv have agent_num agents and env_num envs should act as a venv with agent_num x env_num envs. env_id = agent_num x env_num + env_num should be contained...

> but couldn't find out how to get the done flag. They always treat `discount` as another term of done. Ref: https://github.com/sail-sg/envpool/blob/5b08389ec0fad903a9fb3288d54f470bc790bdfc/envpool/python/dm_envpool.py#L63 https://github.com/deepmind/deepmind-research/blob/1642ae3499c8d1135ec6fe620a68911091dd25ef/rl_unplugged/atari.py#L227