typewriter
typewriter copied to clipboard
Masking illegal actions
Is there any interface for masking illegal actions?
Ideally, I'd like the agent network to only apply the softmax over the set of legal moves (which can be calculated as function of the current state) and set all other action probabilities to zero (e.g. you cannot play on top of an existing marker in tic tac toe).