Retraining current best model flattens performance immediately
We trained our model using mjrl, with a PPO agent and MLP policy. When we try to improve our model using our current best model, the performance flattens immediately (see attachment in which we stopped training after 75 iterations and restarted training using the same reward function and 75-iter model). When we train an entirely new policy for 100 iterations, it keeps improving after 75 iterations. It seems that restarting the training has an influence on the performance of the training.
Do you have any idea if retraining a policy is possible and how to fix this problem?
Thanks in advance!

Hi @Remcop04,
I have extensively used the NPG agent from the mjrl repo for restarting and it works reasonably well. What I usually do is to boost the policy's log_std a bit to revive the exploration, in case it has gotten too small to make progress. Hope it helps.
PS: I haven't used the PPO agent. So can't comment on it specifically. It does seem to follow the same reload-from-checkpoint pipeline. This is where the existing logs are being read. Also tagging @aravindr93, in case he can share some wisdom.