PILCO Computation of cross-covariance of state and action

Computation of cross-covariance of state and action

Open dvtailor opened this issue 5 years ago • 0 comments

From only looking at the docstrings of the relevant functions, I think I noticed a discrepancy to the paper. I am writing this without checking the math in the code so I may be wrong.

V returned in RbfController.compute_action() in controllers.py corresponds to Cov[x,u]

From backtracking to MGPR.predict_given_factorizations() in models/mgpr.py, I think the docstrings indicate that:

V = cov[x,x]^{-1} @ cov[x,pi] @ cov[pi,u]

where I call pi the action before squashing

From section 5.5 of the 2015 paper, it says:

V = cov[x,pi] @ cov[pi,pi]^{-1} @ cov[pi,u]

Are these expressions equivalent or have I misread something. Thanks!

Oct 07 '19 03:10 dvtailor

PILCO PILCO copied to clipboard

Computation of cross-covariance of state and action

PILCO
PILCO copied to clipboard