LightZero
LightZero copied to clipboard
Stochastic MuZero MLP Issues Related to Chance Space
When I try to train a model using stochastic muzero MLP with chance encoder, I am getting error related to indexing at this line: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L172
This is because _dynamics
receives chance as input, so the size considered while encoding should be self.chance_space_size
, as in the conv model.
When I fixed that, I faced another issue of matrix size mismatch, which is caused by wrong dimension in this line: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L220
To fix this, I changed dimension passed to dynamics_network
initialisation to self.chance_space_size
here: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L116
After this, I faced another issue of matrix size mismatch, which is caused by using dynamics_network
in _afterstate_dynamics
function. In conv
model, it uses afterstate_dynamics_network
. Link: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L277
So I also changed this line to use afterstate_dynamics_network
.
After making these changes, model is training and loss is going down and I didn't face any other error.
I think there should be either a new variable for chance_encoding_dim
or chance encoding should be fixed to either one-hot or not one-hot in _dynamics
function.
Please confirm whether this is correct way to fix or did I miss anything?
I will be glad to submit a PR with these changes if required.