interactive-intro-rl icon indicating copy to clipboard operation
interactive-intro-rl copied to clipboard

Small Refactoring towards Stability

Open SpyrosMouselinos opened this issue 3 years ago • 0 comments

Notes:


  1. Added CG instead of L-BFGS-B.
  • No constraints are used so there is no need to use it.
  • On the other hand CG has convergence gurantees.
  1. Added intercept support in the OnlineRegression method since the beta_0 variable was unused.

  2. Changed the np.sum operation to np.mean.

  • Sum may work in small buffer sizes but can easily lead to instability, especially when including the intercept
  • Mean is more numerically stable since it reduces the overall magnitude of the minimization process

SpyrosMouselinos avatar Jun 07 '21 11:06 SpyrosMouselinos