PILCO
PILCO copied to clipboard
Cost for trajectory following
Hi, I'm trying to use PILCO on Path tracking for my graduation thesis, but for now the control results are not ideal. I think it could be improved with a reword for trajectory following. Do you know an easy way to do this ? Thanks a lot for the help
Stefan
I'm wondering the same thing, did you ever figure this out?
The reward is calculated here, so perhaps you could modify it to add what you need?
I think I managed to implement it in the original Matlab version. What you can do is:
- Change the linear policy from M = Wm + b to M = Wm + b * r(t) for the current timestep t (make sure this t is passed to the function). Change the policy gradient dMdp as well - its gradient w.r.t. b used to be 1, but is r(t) now. I do not believe the gradient w.r.t the variance changes. Alternatively, use another parametrization, as long as it uses r(t).
- Pass the current time t to the cost function as well, use this r(t) for the immediate reward instead of a fixed x_target