Jiayi Weng comments

Results 303 comments of


                                            Jiayi Weng

Getting started example causes TypeError: object of type 'TimeLimit' has no len()

Update: I can't reproduce the above error. I think it's because of a version mismatch, since I ran with current master version instead of using `tianshou==0.4.11`. Will post a new...

How to train a offline BCQ model with a custom logged data?

You can make a buffer, load the data to RAM and reformat to be ReplayBuffer-compatible, and save it. This is a great example to start with: https://github.com/thu-ml/tianshou/blob/4756ee80ff11cd8692aef3752f35c0af60a452e8/examples/offline/convert_rl_unplugged_atari.py

Regarding the error related to SEED when I train in a homebrew environment

Ooh, try to install an editable version of Tianshou, @MischaPanch changed a lot recently, and 0.5.1 was released 1 year ago. Alternatively, you can change the reset call in SubprocVecEnv,...

SAC implementation consider the reduction operator as a parameter as "min" in not always the best choice

Agree in terms of flexibility but `min(critic1, critic2)` is the original setting in the paper. Would you like to submit a PR?

My custom PettingZoo env is working with DQNPolicy but not with PPOPolicy : AttributeError: 'str' object has no attribute 'ndim'

sorry being late here, I think it's the same as https://github.com/thu-ml/tianshou/issues/692?

About the HER implementation in PickAndPlace-v2 environment

@Juno-T any idea?

Action mask in Tictactoe

You need to provide valid action mask as a part of the observation. Please take a look at implementation detail (especially `env.step(act)`'s signature) in TicTacToe env.

[Question] Best practice to save and resume training with PPO + reward normalization

It's already in example, see https://github.com/thu-ml/tianshou/blob/4ac407c78f58102fa7f38ded6bfc1e42c703a4a7/examples/mujoco/mujoco_ppo.py#L168-L169 https://github.com/thu-ml/tianshou/blob/4ac407c78f58102fa7f38ded6bfc1e42c703a4a7/examples/mujoco/mujoco_ppo.py#L202-L204

The agent_id should not be of type 'int' but a string such as "player_1".

I guess previously our assumption for pettingzoo env wrapper was that it would provide int -> xxx agent id mapping, but somehow it changed to str -> xxx agent_id mapping....

How does the first test reward come before the first epoch?

https://github.com/copilot/c/91bb6b3c-b325-4400-ba0e-85e87af043f7 Q: where does it perform test_episode before first train step? A: In the `BaseTrainer` class from the `tianshou.trainer.base` module, testing is performed before the first training step in the...