stable-baselines3-contrib icon indicating copy to clipboard operation
stable-baselines3-contrib copied to clipboard

SIL

Open qgallouedec opened this issue 2 years ago • 8 comments

Self Imitation Learning @emrul has implemented SAIL, see https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/pull/139#issuecomment-1445114579

@emrul, is there an official implementation for those two? Do you match the results from the paper with your implementation?

qgallouedec avatar Feb 26 '23 08:02 qgallouedec

Hi @qgallouedec - I haven't don't much testing but if there's no rush I'd love to work on this in my spare time. The official implementation appears to be here: https://github.com/google-research/google-research/tree/master/sail_rl

emrul avatar Feb 26 '23 21:02 emrul

There is no rush at all :)

qgallouedec avatar Feb 27 '23 08:02 qgallouedec

Hey everyone,

I have tried the code what @emrul pasted in the IQN PR comments, it works.

One thing what I haven't got to work is the SubProcEenv wrapping. Just wanted to let you know. :)

richardjozsa avatar Feb 28 '23 13:02 richardjozsa

Thanks @richardjozsa - that's interesting because I exclusively use SubProcVecEnv for training and the Dummy vec env for evaluation. What happens when you use SubProcVecEnv?

emrul avatar Feb 28 '23 14:02 emrul

This is the error what I got, but if it works for you than I recheck. I use a customenv maybe that caused something.

Traceback (most recent call last): RLTEST | File "/usr/lib/python3.10/multiprocessing/forkserver.py", line 274, in main RLTEST | code = _serve_one(child_r, fds, RLTEST | File "/usr/lib/python3.10/multiprocessing/forkserver.py", line 313, in _serve_one RLTEST | code = spawn._main(child_r, parent_sentinel) RLTEST | File "/usr/lib/python3.10/multiprocessing/spawn.py", line 126, in _main RLTEST | self = reduction.pickle.load(from_parent) RLTEST | File "/home/ftuser/.local/lib/python3.10/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 375, in setstate RLTEST | self.var = cloudpickle.loads(var) RLTEST | ModuleNotFoundError: No module named 'base'

richardjozsa avatar Feb 28 '23 14:02 richardjozsa

... looks like an error trying to load your env from Pickle but in my modifications I don't make any changes to envs (the replay buffer holds the SAIL returns internally) so I don't think this should be caused by amendments.

emrul avatar Feb 28 '23 14:02 emrul

My bad sorry, it was in my environment, it works fine. Only comment, you have set the replay buffer to device= cpu. I guess that can be auto. :)

richardjozsa avatar Feb 28 '23 15:02 richardjozsa

My bad sorry, it was in my environment, it works fine. Only comment, you have set the replay buffer to device= cpu. I guess that can be auto. :)

Great, and yes - good catch on the device, I will correct that!

emrul avatar Feb 28 '23 15:02 emrul