Justin Yuan
Justin Yuan
i think it's to take the expectation of the Bellman error target since you need to marginalize over next actions when evaluating the next q value
- For RL cost I think it should use the true parameters since that's the **only** source of information for learning, and if that is not the true ones, there's...
But for the control methods, this can be tricky since the cost function is part of both the control algorithm and the environment. The ideal case is we have a...
@adamhall Do we currently have anywhere that needs to be fixed regarding this issue?
I am leaning towards using `symbolic.U_EQ` for linearization and `env.U_EQ` for cost function or reward, the current/updated symbolic model should already be able to expose `U_EQ`, but I'm not sure...