rlai-exercises
rlai-exercises copied to clipboard
Solve numpy overflow warning in GradientBandit.
Expected Behavior
Running GradientBandit shouldn't raise any overflow warnings.
Current Behavior
The warning "RuntimeWarning: overflow encountered in double_scalars" is raised occasionally in the following lines:
estimators.py:110
updated_numerical_preference[action_selected] = self.numerical_preference[action_selected] + self.alpha * (r - baseline) * (1 - probabilities[action_selected])
estimators.py:111
updated_numerical_preference = self.numerical_preference - self.alpha * (r - baseline) * probabilities
estimators.py:120
return np.random.choice(a=self.k_actions, p=self.get_actions_probabilities())
estimators.py:102
self.average_reward = qn + self.alpha * (r - qn)
Steps to Reproduce
- Run Exercise 2.9.py with a GradientBandit with alpha=4.0