TabulaRL icon indicating copy to clipboard operation
TabulaRL copied to clipboard

UCRL2/UCFH confidence intervals are incorrect

Open vzhuang opened this issue 5 years ago • 2 comments

As per Jaksch et. al 2010, the confidence intervals for UCRL2 use t_k := the timestep at the start of episode k. However, in run_finite_tabular_experiment in experiment.py, the episode index is wrongly passed instead of the timestep.

UCFH is also affected by this bug.

vzhuang avatar Jan 29 '20 22:01 vzhuang

Are you 100% sure this is a bug?

If the episodes are of fixed length (they are) then you can compute t_k from just k as (k * episode_length).

My belief is this is what is happening?

iosband avatar Jan 30 '20 09:01 iosband

Right, it's a simple fix. Since the time is inside a log factor, this can't be "fixed" by adjusting the scaling constant. I'm guessing it probably has at least a small impact on your results depending on if you tune the scaling factor.

vzhuang avatar Jan 30 '20 18:01 vzhuang