zedwei

Results 6 comments of zedwei

I'm facing a similar issue. I guess there are two things which may cause this 1. Due to "connection_fraction", a fraction of the full connections were created randomly, which can...

I'm working on the Tractor (拖拉机, 80分, 升级, etc. whatever you call it) card game :). The action space is a bit tricky as in Tractor you can 甩牌 and...

This is great to know! Although right now my trained agent can beat rule-based agent, it's still far away from competing with human - e.g. It was able to learn...

@daochenzha - Similarly, [this line](https://github.com/datamllab/rlcard/blob/6df745d3404a97d884640966a3b3a1d059c5fd0f/rlcard/agents/dqn_agent.py#L178) in the DQN train() function probably also needs to be re-considered/experimented. `best_actions = np.argmax(q_values_next, axis=1)` Theoretically it makes more sense(at least to me) to apply...

Yes, I did push the legal_actions of next state into buffer in my own implementation - basically add another element in the [Transition unit](https://github.com/datamllab/rlcard/blob/6df745d3404a97d884640966a3b3a1d059c5fd0f/rlcard/agents/dqn_agent.py#L34). This is much faster than re-computing...

Interesting. I'll give your implementation a try in my game when I have time and compare with my own implementation. By looking at the code they're doing the same thing....