PILCO icon indicating copy to clipboard operation
PILCO copied to clipboard

Cost for trajectory following

Open Pengxiao-Gao opened this issue 4 years ago • 3 comments

Hi, I'm trying to use PILCO on Path tracking for my graduation thesis, but for now the control results are not ideal. I think it could be improved with a reword for trajectory following. Do you know an easy way to do this ? Thanks a lot for the help

Stefan

Pengxiao-Gao avatar Apr 27 '20 14:04 Pengxiao-Gao

I'm wondering the same thing, did you ever figure this out?

maxvanmeer avatar Jun 04 '20 11:06 maxvanmeer

The reward is calculated here, so perhaps you could modify it to add what you need?

nrontsis avatar Jun 04 '20 13:06 nrontsis

I think I managed to implement it in the original Matlab version. What you can do is:

  • Change the linear policy from M = Wm + b to M = Wm + b * r(t) for the current timestep t (make sure this t is passed to the function). Change the policy gradient dMdp as well - its gradient w.r.t. b used to be 1, but is r(t) now. I do not believe the gradient w.r.t the variance changes. Alternatively, use another parametrization, as long as it uses r(t).
  • Pass the current time t to the cost function as well, use this r(t) for the immediate reward instead of a fixed x_target

maxvanmeer avatar Jun 05 '20 11:06 maxvanmeer