rl [Refactor] Dreamer loss modules

Description

Refactor of the dreamer loss. WIP, TODO: forward function.

Motivation and Context

Requested by @vmoens per popular request

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

[x] Documentation (update in the documentation)
[x] Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply. If you are unsure about any of these, don't hesitate to ask. We are here to help!

[x] I have read the CONTRIBUTION guide (required)
[x] My change requires a change to the documentation.
[x] I have updated the tests accordingly (required for a bug fix or a new feature).
[x] I have updated the documentation accordingly.

Sep 14 '23 14:09 MateuszGuzek

When you now run example, you encounter the error : File "./rl/examples/dreamer/dreamer.py", line 321, in main scaler_actor.scale(actor_loss_td["loss_actor"]).backward() File "~/anaconda3/envs/rl_dreamer_3_10/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "~/anaconda3/envs/rl_dreamer_3_10/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [400, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). I investigated and this happens always when backprop and optimization steps are applied to the results of the world model. Other losses or scaling does not cause the issue. torch.autograd.set_detect_anomaly(True) did not return anything interesting.

When each model loss is applied separately (like in the previous commit of this pull request) all works well.

Does it mean that there is some sharing of parameters that is triggered only in forward, but not in the standard call of the new loss functions? @vmoens

Sep 15 '23 14:09 MateuszGuzek

I think the error that you're getting is due to the fact that you're computing all losses all at once but then you're backpropagating once at a time.

In dreamer each loss has its own scaler used during backpropagation, so backrprop is done for each of them individually Does it mean that doing forward like in this example is simply not applicable in case of dreamer and the forward() should not be implemented or raise NonImplemented exception?

I think there is no real need to have the forward that does all at once, but this is question of codebase consistency if we allow for loss module that does not implement forward().

Sep 15 '23 15:09 MateuszGuzek