PILCO
PILCO copied to clipboard
Computation of cross-covariance of state and action
From only looking at the docstrings of the relevant functions, I think I noticed a discrepancy to the paper. I am writing this without checking the math in the code so I may be wrong.
V
returned in RbfController.compute_action()
in controllers.py
corresponds to Cov[x,u]
From backtracking to MGPR.predict_given_factorizations()
in models/mgpr.py
, I think the docstrings indicate that:
V = cov[x,x]^{-1} @ cov[x,pi] @ cov[pi,u]
where I call pi the action before squashing
From section 5.5 of the 2015 paper, it says:
V = cov[x,pi] @ cov[pi,pi]^{-1} @ cov[pi,u]
Are these expressions equivalent or have I misread something. Thanks!