Jiayi Weng comments

Results 303 comments of


                                            Jiayi Weng

Can we get the action probability in C51?

Close due to no response, feel free to re-open this issue.

The usage of exploration noise

For DQN-family, it uses epsilon-greedy to add some noise to discrete action. For some other policies, it directly adds random noise generated by a distribution to the existing continuous action....

The usage of exploration noise

https://github.com/thu-ml/tianshou/blob/277138ca5b050518aacaaea367192f910fbe666d/test/discrete/test_dqn.py#L118-L127

fixed problem with tuple obs being confused with return tuple

Have you read https://github.com/thu-ml/tianshou/pull/147#issuecomment-660956151? Tuple obs space is not supported because of our design choice. It will lead to many undefined behaviors in `Batch`, and further slow down the entire...

fixed problem with tuple obs being confused with return tuple

> I think the allowing the Critic to have structured obs (Dict) is still a good thing though no? Yes, but we cannot assume which key the users want. Thus...

fixed problem with tuple obs being confused with return tuple

I wrote something here: https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#user-defined-environment-and-different-state-representation

Support Basic MARL

> An multi-agent venv have agent_num agents and env_num envs should act as a venv with agent_num x env_num envs. env_id = agent_num x env_num + env_num should be contained...

Improve discrete control offline RL benchmark

> but couldn't find out how to get the done flag. They always treat `discount` as another term of done. Ref: https://github.com/sail-sg/envpool/blob/5b08389ec0fad903a9fb3288d54f470bc790bdfc/envpool/python/dm_envpool.py#L63 https://github.com/deepmind/deepmind-research/blob/1642ae3499c8d1135ec6fe620a68911091dd25ef/rl_unplugged/atari.py#L227

Jiayi Weng

Can we get the action probability in C51?

The usage of exploration noise

The usage of exploration noise

Join PyTorch ecosystem

Join PyTorch ecosystem

fixed problem with tuple obs being confused with return tuple

fixed problem with tuple obs being confused with return tuple

fixed problem with tuple obs being confused with return tuple

Support Basic MARL

Improve discrete control offline RL benchmark