Stoix
Stoix copied to clipboard
Add stochastic muzero
What?
Added minimal support to stochastic muzero by issue #77.
Why?
To be able to train stochastic environments like 2048, poker, ...
How?
Added Afterstate and Encoder models with configurations to be able to run it. Only MLP models are created, not CNN.
Fixes necessary
- In the loss function from in the last commit the encoder must to receive an Observation like describe in the paper:
And here:
The pseudocode too in the line 931
I don't know how to get the observation and to pass to the encoder in your code.
I don't know how to this in your code.