rlai-exercises icon indicating copy to clipboard operation
rlai-exercises copied to clipboard

Solve numpy overflow warning in GradientBandit.

Open iamhectorotero opened this issue 5 years ago • 0 comments

Expected Behavior

Running GradientBandit shouldn't raise any overflow warnings.

Current Behavior

The warning "RuntimeWarning: overflow encountered in double_scalars" is raised occasionally in the following lines:

estimators.py:110 updated_numerical_preference[action_selected] = self.numerical_preference[action_selected] + self.alpha * (r - baseline) * (1 - probabilities[action_selected])

estimators.py:111 updated_numerical_preference = self.numerical_preference - self.alpha * (r - baseline) * probabilities

estimators.py:120 return np.random.choice(a=self.k_actions, p=self.get_actions_probabilities())

estimators.py:102 self.average_reward = qn + self.alpha * (r - qn)

Steps to Reproduce

  1. Run Exercise 2.9.py with a GradientBandit with alpha=4.0

iamhectorotero avatar Aug 10 '18 14:08 iamhectorotero