rl [BUG] dreamer example broken caused by ``ObservationNorm``

Describe the bug

ObservationNorm with fixed obs_norm_state_dict is used in dreamer example, and this obs_norm_state_dict is estimated by only cfg.init_env_steps steps rollout. When the observation is pixels, some pixels have never changed during the whole trajectory, which makes zero scales.

Altough there are eps=1e-6 making scales non-zero, the scale of observation after normalizing is still too large, which makes Encoder outputs None, and causes huge actor and model losses.

To Reproduce

Run with exactly the default config of dreamer example.

python examples/dreamer/dreamer.py

Expected behavior

By setting obs_norm_state_dict to a normal value, we can avoid this issue.

    # key, init_env_steps, stats = None, None, None
    # if not cfg.vecnorm and cfg.norm_stats:
    #     if not hasattr(cfg, "init_env_steps"):
    #         raise AttributeError("init_env_steps missing from arguments.")
    #     key = ("next", "pixels") if cfg.from_pixels else ("next", "observation_vector")
    #     init_env_steps = cfg.init_env_steps
    #     stats = {"loc": None, "scale": None}
    # elif cfg.from_pixels:
    #     stats = {"loc": 0.5, "scale": 0.5}
    # proof_env = transformed_env_constructor(
    #     cfg=cfg, use_env_creator=False, stats=stats
    # )()
    # initialize_observation_norm_transforms(
    #     proof_environment=proof_env, num_iter=init_env_steps, key=key
    # )
    # _, obs_norm_state_dict = retrieve_observation_norms_state_dict(proof_env)[0]
    # proof_env.close()
    obs_norm_state_dict = {"loc": 0.5, "scale": 0.5}

Screenshots

Dreamer__8044d54b_23_11_13-01_54_04 is the original one, and Dreamer__8c83e177_23_11_13-02_02_24 is the modified one.

loss_model_kl, loss_world_model, loss_model_reco, loss_model_reward and grad_world_model is None in original one, and r_training is low, seems that the original learned nothing.

BTW, the modified one is also broken in around 155k, loss_actor increases huge suddenly. However, I am not an expert of dreamer, could someone tells me why?

System info

torchrl.__version__ = 0.2.1
numpy.__version__ = 1.26.1 
sys.version = 3.9.18 (main, Sep 11 2023, 13:41:44) [GCC 11.2.0]
sys.platform =  linux

Reason and Possible fixes

see Expected behavior.

Checklist

[x] I have checked that there is no similar issue in the repo (required)
[x] I have read the documentation (required)
[x] I have provided a minimal working example to reproduce the bug (required)

Nov 13 '23 09:11 FrankTianTT

Thanks for pointing this out! Fixing dreamer is one my top priorities for the next release, i'll do my best to address this asap

Nov 13 '23 11:11 vmoens

@vmoens Cool! I have noticed that there is a issue listing some potential improvement (https://github.com/pytorch/rl/issues/916), I am really looking forward to it!

And just like MPC, dreamer is also failed to deal with early-stop env, I am addressing it now. But I have a question, WorldModelWrapper wrap the transition and reward model, but not cover terminated model, Why?

To make dreamer is sensitive to done, we need learn it explicitly. To do that, we need a new WorldModelWrapper including transition_model, terminated_model and reward_model, maybe we could take transition_model as an optional arg?

Nov 13 '23 14:11 FrankTianTT