Fernando
Fernando
In `game.py`, line 62-72: ```python def current_state(self): .... square_state[0][move_curr // self.width, move_curr % self.height] = 1.0 .... ``` I assume you may actually mean `[move_curr // self.width, move_curr % self.width]`....
Current implementation could be found here #2466. In fact, iteratively calling stochastic gradient descent is quite inefficient for distributed frameworks like Mars. Potential solutions could be: 1. Zhuang, Yong, et...
### 🚀 Feature While instantiating an RL algorithm, e.g., PPO, a static or pre-trained model can be passed through as a `features_extractor`. ### Motivation Currently, one can customize the feature...
Hi, I am wondering if the oscillation of the training phase comes from the fact that you only include down-sampling layers in your actor nets, since in partially observable domains,...