softlearning Question on the soft q learning implementation

Question on the soft q learning implementation

Open YuxuanSong opened this issue 4 years ago • 1 comments

Hi Haarnoja,

Thanks a lot for maintaining the amazing repo! I feel a little confused about the implementation of SVGD in soft-q learning. At https://github.com/rail-berkeley/softlearning/blob/05daa5524ae1a76b70b8a8a29a0f5f824d401484/softlearning/algorithms/sql.py#L281 ，the log probs is calculated as log_probs = svgd_target_values + squash_correction，where is log probs on the $u$(raw_action) space. ($a$ = tanh($u$)) However, the following SVGD used the log probs on the $u$ space to get the updated directions of $a$, which seems to be not aligned.

I think there should be actions = self._policy.raw_actions(expanded_observations) in https://github.com/rail-berkeley/softlearning/blob/05daa5524ae1a76b70b8a8a29a0f5f824d401484/softlearning/algorithms/sql.py#L235. (the policy class could add this property.)

Best， Yuxuan

May 19 '20 08:05 YuxuanSong

Hey @YuxuanSong, thanks for bringing this up! The SQL implementation in this repo was migrated from https://github.com/haarnoja/softqlearning and I have actually not tested it thoroughly. I'll try to take a closer look at this soon and make sure it's implemented properly.

May 19 '20 09:05 hartikainen

softlearning softlearning copied to clipboard

Question on the soft q learning implementation

softlearning
softlearning copied to clipboard