Stoix
Stoix copied to clipboard

Published 20 hours ago •

Reame
Issues

Add stochastic muzero

Open ipsec opened this issue 2 months ago • 1 comments

What?

Added minimal support to stochastic muzero by issue #77.

Why?

To be able to train stochastic environments like 2048, poker, ...

How?

Added Afterstate and Encoder models with configurations to be able to run it. Only MLP models are created, not CNN.

Fixes necessary

In the loss function from in the last commit the encoder must to receive an Observation like describe in the paper:

And here:

The pseudocode too in the line 931

I don't know how to get the observation and to pass to the encoder in your code.

In the pseudocode the value target are calculated in every unroll step, lines 910 and 948

I don't know how to this in your code.

May 15 '24 17:05 ipsec