softlearning
softlearning copied to clipboard
Question on the soft q learning implementation
Hi Haarnoja,
Thanks a lot for maintaining the amazing repo! I feel a little confused about the implementation of SVGD in soft-q learning. At https://github.com/rail-berkeley/softlearning/blob/05daa5524ae1a76b70b8a8a29a0f5f824d401484/softlearning/algorithms/sql.py#L281 ,the log probs is calculated as log_probs = svgd_target_values + squash_correction,where is log probs on the $u$(raw_action) space. ($a$ = tanh($u$)) However, the following SVGD used the log probs on the $u$ space to get the updated directions of $a$, which seems to be not aligned.
I think there should be actions = self._policy.raw_actions(expanded_observations) in https://github.com/rail-berkeley/softlearning/blob/05daa5524ae1a76b70b8a8a29a0f5f824d401484/softlearning/algorithms/sql.py#L235. (the policy class could add this property.)
Best, Yuxuan
Hey @YuxuanSong, thanks for bringing this up! The SQL implementation in this repo was migrated from https://github.com/haarnoja/softqlearning and I have actually not tested it thoroughly. I'll try to take a closer look at this soon and make sure it's implemented properly.