imitation icon indicating copy to clipboard operation
imitation copied to clipboard

[Question] Reward net transfer

Open risufaj opened this issue 1 year ago • 1 comments

Hello,

I want to run IRL on a task with some expert demonstrations. The demonstrations are a bit old, and since then, the action space action has increased. For instance, in the first version of the task there were only 5 actions, whereas in the new version there are 3 new actions that can be taken. Is it possible to train a reward net using the existing expert demonstrations (e.g. using AIRL) and then used the trained reward net to train a new policy considering the added actions? If so, I'm not entirely sure how it would look like when creating a RewardNet class.

I would appreciate some help.

Thanks in advance.

risufaj avatar Feb 02 '24 23:02 risufaj

I think you can use again reward net again to train a new policy with added actions as long as you use a state-only parameter on the reward net

rizqisubeno avatar Sep 20 '24 07:09 rizqisubeno