Jiayi Weng

Results 293 comments of Jiayi Weng

Oh I see a lot of email in spam... Sorry about that.

1. 数据结构我是以小教员的身份上的(相当于学生助教),可以不用做oj题 2. 数据结构有honor code协议,原则上**不允许**学生公开他们所提交的的代码

It's because the mask action is not implemented in DiscreteSACPolicy. The observation is a Batch containing `obs`, `mask`, and `agent_id`. `torch.as_tensor(batch)` will throw the above error. https://github.com/thu-ml/tianshou/blob/0f59e38b126f7fb7696b79e53c86cd7b321550cb/tianshou/policy/modelfree/discrete_sac.py#L81 You can set...

Unfortunately, there's a known issue in the current codebase #486 that has not been fixed. I think that should be the root cause :(

no plan :(

Have you tried setting `train_interval = 1` in TensorboardLogger? https://github.com/thu-ml/tianshou/blob/0f59e38b126f7fb7696b79e53c86cd7b321550cb/tianshou/utils/logger/base.py#L15

Have you tried to tune `eps_test`? That really matters the performance of DQN-family policy.

But according to my experience, setting it to 0 actually hurts the performance because Q-learning needs some randomness to escape the local minimal. Did you ever try eps_test==0.01 or 0.001...

> I believe this should be a problem about code. I think so. What does your script look like? I guess maybe there's wrong argument to policy/collector -- since you...

Have you played with NoisyLinear? This layer's training and testing behavior are not the same. See https://github.com/thu-ml/tianshou/blob/0f59e38b126f7fb7696b79e53c86cd7b321550cb/tianshou/utils/net/discrete.py#L369-L379 In fact, you can do a sanity check with the following: ```py policy.train()...