catanatron
catanatron copied to clipboard
Implementing MaskedPPO Player
This is where I'm at currently, more updates to come after I train the CustomCNN version for a while. I trained the non-CNN version of the model for 10,000,000 timesteps against the ValueFunctionPlayer and then it had an 80% winrate against random. Not incredible progress, but at least the training and playing scripts are running! Excited to see if this feature_extractor CNN will help