brax icon indicating copy to clipboard operation
brax copied to clipboard

Inconsistency in populating State.info

Open namheegordonkim opened this issue 3 years ago • 1 comments
trafficstars

It seems that the class State defines info as a dictionary, and env wrappers are getting away treating it as such:

@struct.dataclass
class State:
  """Environment state for training and inference."""
  qp: brax.QP
  obs: jp.ndarray
  reward: jp.ndarray
  done: jp.ndarray
  metrics: Dict[str, jp.ndarray] = struct.field(default_factory=dict)
  info: Dict[str, Any] = struct.field(default_factory=dict)

e.g. https://github.com/google/brax/blob/a09f3e75390c1ff017565ba2acfa77e838535fd3/brax/envs/wrappers.py#L67

  def reset(self, rng: jp.ndarray) -> brax_env.State:
    state = self.env.reset(rng)
    state.info['steps'] = jp.zeros(())
    state.info['truncation'] = jp.zeros(())
    return state

However, default environments use self.sys.info to get an Info class object, but it's dumped in favour of creating a completely empty dictionary. e.g. https://github.com/google/brax/blob/a09f3e75390c1ff017565ba2acfa77e838535fd3/brax/envs/hopper.py#L81

  def reset(self, rng: jp.ndarray) -> brax_env.State:
    """Resets the environment to an initial state."""
    rng, rng1, rng2 = jp.random_split(rng, 3)
    qpos = self.sys.default_angle() + jp.random_uniform(
        rng1, (self.sys.num_joint_dof,), -.005, .005)
    qvel = jp.random_uniform(rng2, (self.sys.num_joint_dof,), -.005, .005)
    qp = self.sys.default_qp(joint_angle=qpos, joint_velocity=qvel)
    info = self.sys.info(qp)
    obs = self._get_obs(qp)
    reward, done, zero = jp.zeros(3)
    metrics = {
        'reward_forward': zero,
        'reward_ctrl': zero,
        'reward_healthy': zero,
    }
    return brax_env.State(qp, obs, reward, done, metrics)

So if I'm reading this right, the State.info is not supposed to be coming from self.sys.info? In which case, self.sys.info will never be returned as part of State. Is this intentional?

namheegordonkim avatar Feb 03 '22 12:02 namheegordonkim

Yes, this is confusing. Info is a struct returned by the brax physics step containing extra metadata (mainly impulses). But it's ALSO extra metadata used by env wrappers for book-keeping. And on top of that, you noticed a useless line in hopper that is probably left over from a refactor. You're right, we're fetching info in hopper and then doing nothing with it. I'll make a note to delete that line.

This will probably become less confusing when we migrate environments out of Brax, then the term 'Info' will be less ambiguous because their two usages will be more separate.

erikfrey avatar Feb 06 '22 23:02 erikfrey