DecisionTransformerInterpretability
DecisionTransformerInterpretability copied to clipboard
Investigate whether anyone else does/ just experiment with finetuning of PPO models without entropy at the end of training to remove entropy optimising behaviors.
See discussion in #43