Inconsistent naming of expert demonstrations
The expert demonstrations are referred in the code base as:
- episodes
- trajectories
- lists of transitions
- rollouts
- demonstrations
I think we should clarify in the documentation what we mean by them and maybe get rid of some of the terms. What are your thoughts?
Not sure how I missed this the first time around. I agree we should standardize, but some care needed here, as each of these names means a different thing or at least has different conntations.
For example, lists of transitions and trajectories are not the same data type. You can get lists of transitions from trajectories, but not the other way around. Some algorithms like BC can just learn from sequences of transitions; other algorithms (like MCE IRL IIRC) need whole trajectories.
I tend to think of demonstrations as rollouts from the expert, whereas rollouts can also come from other policies. Episodes is pretty redundant with rollouts and is probably best avoided being used except for "number of episodes".
So I propose:
- Decide if we want to call it episodes, trajectories or rollouts? Maybe look at the naming conventions of SB3.
- Demonstrations should only be used as a parameter name but with the qualification of the type. So use
demonstration_transitionsanddemonstration_trajectories/demonstration_rollouts...
So I propose:
- Decide if we want to call it episodes, trajectories or rollouts? Maybe look at the naming conventions of SB3.
OTOH I'd vote for calling them trajectories, although happy to stick with an SB3 naming convention if it exists. Trajectories feels like the most general one. I don't think "rollouts from a human" makes much sense, for example. And in theory we might not even always have an episodic environment https://github.com/HumanCompatibleAI/imitation/issues/575 so calling them episodes is odd
- Demonstrations should only be used as a parameter name but with the qualification of the type. So use
demonstration_transitionsanddemonstration_trajectories/demonstration_rollouts...
I'd personally towards just calling them demonstrations -- the type annotation usually makes it clear what types they can accept. Also any algo that can take transitions can also take trajectories (as can flatten trajectories into transitions), and it feels a bit odd to say func(demonstrations_transitions=expert_trajectories).
SB3 uses the term trajectory and rollout but a rollout sometimes refers to something generated by a learned model. So I think trajectory is less ambiguous.
I agree that type annotations should be enough to distinguish between different types of demonstrations.