Jiayi Weng comments

Results 293 comments of


                                            Jiayi Weng

Add israel this system AIS

Oh I see a lot of email in spam... Sorry about that.

没看到数据结构呀！

1. 数据结构我是以小教员的身份上的（相当于学生助教），可以不用做oj题 2. 数据结构有honor code协议，原则上**不允许**学生公开他们所提交的的代码

Does the MulitAgentPolicyManager support other policy, e.g. DiscreteSACPolicy?

It's because the mask action is not implemented in DiscreteSACPolicy. The observation is a Batch containing `obs`, `mask`, and `agent_id`. `torch.as_tensor(batch)` will throw the above error. https://github.com/thu-ml/tianshou/blob/0f59e38b126f7fb7696b79e53c86cd7b321550cb/tianshou/policy/modelfree/discrete_sac.py#L81 You can set...

SAC + LSTM

Unfortunately, there's a known issue in the current codebase #486 that has not been fixed. I think that should be the root cause :(

SAC + LSTM

no plan :(

Question of logging

Have you tried setting `train_interval = 1` in TensorboardLogger? https://github.com/thu-ml/tianshou/blob/0f59e38b126f7fb7696b79e53c86cd7b321550cb/tianshou/utils/logger/base.py#L15

Using wrapper or mask makes a great training but a terrible testing

Have you tried to tune `eps_test`? That really matters the performance of DQN-family policy.

Using wrapper or mask makes a great training but a terrible testing

But according to my experience, setting it to 0 actually hurts the performance because Q-learning needs some randomness to escape the local minimal. Did you ever try eps_test==0.01 or 0.001...

Using wrapper or mask makes a great training but a terrible testing

> I believe this should be a problem about code. I think so. What does your script look like? I guess maybe there's wrong argument to policy/collector -- since you...

Using wrapper or mask makes a great training but a terrible testing

Have you played with NoisyLinear? This layer's training and testing behavior are not the same. See https://github.com/thu-ml/tianshou/blob/0f59e38b126f7fb7696b79e53c86cd7b321550cb/tianshou/utils/net/discrete.py#L369-L379 In fact, you can do a sanity check with the following: ```py policy.train()...