selfmonitoring-agent
selfmonitoring-agent copied to clipboard
Categorical Sampling during training
I wonder what the differences are between using categorical sampling to choose the next action during training, vs using a student-forcing method. Is it the same?
Also, does your code provide a way of training with a teacher-forcing regime?
Thank you!