catanatron Implementing MaskedPPO Player

Implementing MaskedPPO Player

Open zarns opened this issue 4 months ago • 2 comments

This is where I'm at currently, more updates to come after I train the CustomCNN version for a while. I trained the non-CNN version of the model for 10,000,000 timesteps against the ValueFunctionPlayer and then it had an 80% winrate against random. Not incredible progress, but at least the training and playing scripts are running! Excited to see if this feature_extractor CNN will help

Oct 06 '24 23:10 zarns

catanatron catanatron copied to clipboard

Implementing MaskedPPO Player

catanatron
catanatron copied to clipboard