imitation
imitation copied to clipboard
MCE IRL: add support for state-action rewards
Currently only state-based rewards are supported. Ideally we'd allow state-action based rewards as well. This would be easy from the RewardNet side, but would also require support to calculate state-action occupancy measure, whereas we currently just have state occupancy measure.