Stoix icon indicating copy to clipboard operation
Stoix copied to clipboard

Add stochastic muzero

Open ipsec opened this issue 2 months ago • 1 comments

What?

Added minimal support to stochastic muzero by issue #77.

Why?

To be able to train stochastic environments like 2048, poker, ...

How?

Added Afterstate and Encoder models with configurations to be able to run it. Only MLP models are created, not CNN.

Fixes necessary

  1. In the loss function from in the last commit the encoder must to receive an Observation like describe in the paper: image

And here:

image

The pseudocode too in the line 931

I don't know how to get the observation and to pass to the encoder in your code.

  1. In the pseudocode the value target are calculated in every unroll step, lines 910 and 948

I don't know how to this in your code.

ipsec avatar May 15 '24 17:05 ipsec