Jiayi Weng
Jiayi Weng
Let me set up the discord server so that we can talk the detail there.
Another request: I'm trying to [use mujoco source code to build envpool](https://github.com/sail-sg/envpool/pull/141). However, there are some small precision issues (https://github.com/deepmind/mujoco/issues/294). The corresponding wheels are in https://github.com/sail-sg/envpool/actions/runs/2381544251 Not sure if it...
Feel free to submit a pull request to fix that issue.
I guess it may because -1e18 is too large so it affects other weights of network?
As long as https://github.com/openai/gym/pull/3019 is not merged, it's okay not to restrict the gym version.
10 is from `batch_size` of https://github.com/thu-ml/tianshou/blob/278c91a2228a46049a29c8fa662a467121680b10/tianshou/policy/modelfree/ppo.py#L111 https://github.com/thu-ml/tianshou/blob/278c91a2228a46049a29c8fa662a467121680b10/tianshou/policy/base.py#L277 https://github.com/thu-ml/tianshou/blob/278c91a2228a46049a29c8fa662a467121680b10/tianshou/trainer/onpolicy.py#L131-L136 Could you please print `len(batch)` at the beginning of PPOPolicy.learn function to see what happens?
Can you share the training script and detailed error message? Either here or send it to my email. I guess you mistakenly use something.
What's your observation space and action space? Can you print them and paste here? Because I don't know what your `stimulateEnv-v0` is. Or you can delete the unnecessary part of...
Since your action space is multi discrete, I'd recommend you to use BranchDQN; or if you can convert it to continuous action space, you can use all examples under `test/continuous/`....