interactive-intro-rl
interactive-intro-rl copied to clipboard
Small Refactoring towards Stability
Notes:
- Added CG instead of L-BFGS-B.
- No constraints are used so there is no need to use it.
- On the other hand CG has convergence gurantees.
-
Added intercept support in the OnlineRegression method since the beta_0 variable was unused.
-
Changed the np.sum operation to np.mean.
- Sum may work in small buffer sizes but can easily lead to instability, especially when including the intercept
- Mean is more numerically stable since it reduces the overall magnitude of the minimization process