Jiayi Weng

Results 303 comments of Jiayi Weng

Update: I can't reproduce the above error. I think it's because of a version mismatch, since I ran with current master version instead of using `tianshou==0.4.11`. Will post a new...

You can make a buffer, load the data to RAM and reformat to be ReplayBuffer-compatible, and save it. This is a great example to start with: https://github.com/thu-ml/tianshou/blob/4756ee80ff11cd8692aef3752f35c0af60a452e8/examples/offline/convert_rl_unplugged_atari.py

Ooh, try to install an editable version of Tianshou, @MischaPanch changed a lot recently, and 0.5.1 was released 1 year ago. Alternatively, you can change the reset call in SubprocVecEnv,...

Agree in terms of flexibility but `min(critic1, critic2)` is the original setting in the paper. Would you like to submit a PR?

sorry being late here, I think it's the same as https://github.com/thu-ml/tianshou/issues/692?

You need to provide valid action mask as a part of the observation. Please take a look at implementation detail (especially `env.step(act)`'s signature) in TicTacToe env.

It's already in example, see https://github.com/thu-ml/tianshou/blob/4ac407c78f58102fa7f38ded6bfc1e42c703a4a7/examples/mujoco/mujoco_ppo.py#L168-L169 https://github.com/thu-ml/tianshou/blob/4ac407c78f58102fa7f38ded6bfc1e42c703a4a7/examples/mujoco/mujoco_ppo.py#L202-L204

I guess previously our assumption for pettingzoo env wrapper was that it would provide int -> xxx agent id mapping, but somehow it changed to str -> xxx agent_id mapping....

https://github.com/copilot/c/91bb6b3c-b325-4400-ba0e-85e87af043f7 Q: where does it perform test_episode before first train step? A: In the `BaseTrainer` class from the `tianshou.trainer.base` module, testing is performed before the first training step in the...