brax
brax copied to clipboard
Inconsistency in populating State.info
It seems that the class State defines info as a dictionary, and env wrappers are getting away treating it as such:
@struct.dataclass
class State:
"""Environment state for training and inference."""
qp: brax.QP
obs: jp.ndarray
reward: jp.ndarray
done: jp.ndarray
metrics: Dict[str, jp.ndarray] = struct.field(default_factory=dict)
info: Dict[str, Any] = struct.field(default_factory=dict)
e.g. https://github.com/google/brax/blob/a09f3e75390c1ff017565ba2acfa77e838535fd3/brax/envs/wrappers.py#L67
def reset(self, rng: jp.ndarray) -> brax_env.State:
state = self.env.reset(rng)
state.info['steps'] = jp.zeros(())
state.info['truncation'] = jp.zeros(())
return state
However, default environments use self.sys.info to get an Info class object, but it's dumped in favour of creating a completely empty dictionary.
e.g.
https://github.com/google/brax/blob/a09f3e75390c1ff017565ba2acfa77e838535fd3/brax/envs/hopper.py#L81
def reset(self, rng: jp.ndarray) -> brax_env.State:
"""Resets the environment to an initial state."""
rng, rng1, rng2 = jp.random_split(rng, 3)
qpos = self.sys.default_angle() + jp.random_uniform(
rng1, (self.sys.num_joint_dof,), -.005, .005)
qvel = jp.random_uniform(rng2, (self.sys.num_joint_dof,), -.005, .005)
qp = self.sys.default_qp(joint_angle=qpos, joint_velocity=qvel)
info = self.sys.info(qp)
obs = self._get_obs(qp)
reward, done, zero = jp.zeros(3)
metrics = {
'reward_forward': zero,
'reward_ctrl': zero,
'reward_healthy': zero,
}
return brax_env.State(qp, obs, reward, done, metrics)
So if I'm reading this right, the State.info is not supposed to be coming from self.sys.info? In which case, self.sys.info will never be returned as part of State. Is this intentional?
Yes, this is confusing. Info is a struct returned by the brax physics step containing extra metadata (mainly impulses). But it's ALSO extra metadata used by env wrappers for book-keeping. And on top of that, you noticed a useless line in hopper that is probably left over from a refactor. You're right, we're fetching info in hopper and then doing nothing with it. I'll make a note to delete that line.
This will probably become less confusing when we migrate environments out of Brax, then the term 'Info' will be less ambiguous because their two usages will be more separate.