MetaIRL
MetaIRL copied to clipboard
A question on the info loss
In this file: https://github.com/ermongroup/MetaIRL/blob/master/inverse_rl/models/info_airl_state_train.py#L141
We learn that the info loss is the negative log likelihood of q_\psi( m | expert trajectory)
# Get "m" distribution by feeding expert trajectory
context_dist_info_vars = self.context_encoder.dist_info_sym(expert_traj_var)
# Sample a "m" from the distribution
context_mean_var = context_dist_info_vars["mean"]
context_log_std_var = context_dist_info_vars["log_std"]
eps = tf.random.normal(shape=tf.shape(context_mean_var))
reparam_latent = eps * tf.exp(context_log_std_var) + context_mean_var
# Compute the log probability of the sampled "m" in its own distribution
log_q_m_tau = tf.reshape(self.context_encoder.distribution.log_likelihood_sym(reparam_latent, context_dist_info_vars) ...
info_loss = - tf.reduce_mean(log_q_m_tau ...
However, this is different from the equation shown in the original paper. In the below paragraph, in my understanding, you do the following things:
- Use expert trajectory to get a "m distribution": q_\psi( \cdot | expert trajectory)
- Use the "m distribution" to sample a context vector "m"
- Feed the "m" to the policy and get agent trajectory "tau_agent"
- Feed the agent trajectory to the posterior distribution to get an estimated "m'": q_\psi(m' | agent trajectory)
- Compute the log probability of the estimated "m'" in the original "m distribution" estimated from expert trajectory.

However, in the code above, you are simply computing the log probability of a sampled "m" on its original distribution q_\psi( \cdot | expert trajectory) estimated from the expert trajectory. That is, you skip the step 3 and 4.
I want to know if this is intended and if my understanding is correct.
Thank you!