[Question] Training speed of Orbit compared to IsaacGym
Hi all,
I was using IsaacGym for my humanoid learning before and have recently moved to Orbit.
One thing that has been bothering me for a while is that I observe that the training speed in Orbit is about 40% slower than in IsaacGym.
Here are some specification for comparing training speed of Orbit and IsaacGym (legged_gym in RSL). Robot type: Humanoid / Using same URDF on both sides
| Parameters | Orbit | IsaacGym |
|---|---|---|
| num_steps_per_env | 24 | 24 |
| sim_dt | 0.001 [s] | 0.001 [s] |
| decimation | 10 | 10 |
| collection time / iteration | 1.38 [s] | 0.85 [s] |
Although I wouldn't mention every parameters here, I double checked that almost all params ( physx settings, runner settings etc.) are same.
While I was tracking down where the difference is originating from, I ended up at simulator update code in decimation for-loop.
Here's the Orbit's simulator update code in rl_task_env.py:
for _ in range(self.cfg.decimation):
# set actions into buffers
self.action_manager.apply_action() # Line 1
# set actions into simulator
self.scene.write_data_to_sim() # Line 2
# simulate
self.sim.step(render=False) # Line 3
# update buffers at sim dt
self.scene.update(dt=self.physics_dt) # Line 4
Here's the IsaacGym's simulator update code in legged_robot.py:
for _ in range(self.cfg.control.decimation):
self.torques = self._compute_torques(self.actions).view(self.torques.shape) # Line 1
self.gym.set_dof_actuation_force_tensor(self.sim, gymtorch.unwrap_tensor(self.torques)) # Line 2
self.gym.simulate(self.sim) # Line 3
if self.device == 'cpu':
self.gym.fetch_results(self.sim, True)
self.gym.refresh_dof_state_tensor(self.sim) # Line 4
As you might see, each Line 1,2,3,4 is functionally corresponding.
So I measured the execution time of each Line for line-by-line comparison and here's the results.
| Line | Orbit [s] | IsaacGym [s] |
|---|---|---|
| Line 1 | 0.00027 | 0.00186 |
| Line 2 | 0.00417 | 0.00076 |
| Line 3 | 0.03805 | 0.03389 |
| Line 4 | 0.01720 | 0.00065 |
| Total | 0.05983 | 0.03725 |
As you can see, multiplying the difference of the total time by num_steps_per_env ((0.05983 - 0.03725) x 24 = 0.542 s) explains why the Orbit is slower roughly 40% than IsaacGym in my case.
Here are my questions:
- Is Orbit slower than IsaacGym by design? If so, 40% seems a quite of slow down. I'm wondering if this will be improved in the future.
- If not (if the training speed should be more or less same), I'm wondering what I am missing here. From my understanding, Line 3 and Line 4 have significantly slowed down. Line 3 is related to physx update and Line 4 is related to
self.scenewhere we update the robot articulation and the sensors. I would appreciate if you can guide me if I'm missing any important parameter setting to make Orbit faster.
Thank you for the help in advance! Best,
I have noticed the same thing. When running legged gym with the default anymal c rough terrain task, the iteration time is around 1.1 seconds while the same task in Isaac Lab is at least 2 seconds per iteration or more. According to the printout and my summaries it seems like most of the increase is in the collection time, not the learning time, so I also suspect it is an issue with orbit/isaac lab. Note I was running in a containerized cluster environment so I know i had the same hardware and software as the default config. As far as i can tell the tasks are the same in legged_gym and in isaac lab, so there shouldn't be an increase like this. This nearly doubles the training time, so I would really appreciate if this could be investigated.
Yes, some slowdown is expected from the legged-gym as there are changes in the environment's implementation. Out of my mind, these are as follows:
- Using a warp-based ray-caster sensor instead of height-field-based lookup: The ray-cast sensor allows different patterns than just a height-scan, which is generally more useful. However, the ray-cast operation is more expensive than a simple tensor look-up
- Modualirzation in the manager-based environment: To make it easier to configure the environment in various ways, we decomposed it into several managers. This does add some overhead to the environment step (~5% when I last investigated). However, it shouldn't be the main factor for the slow-down.
Recently, we updated the framework to use lazy-tensors, so we hope that reduces some of the performance gaps you see.
Interesting, thank you for the timely response. Would you recommend I disable the ray casting calculations somehow? Or is that not possible?