Bug Description

Hello, I’m currently encountering an issue with the Unitree Go2 robot model when using a terrain generator(https://github.com/leggedrobotics/terrain-generator) to create complex terrains. As soon as I spawn the Go2 on this generated mesh, the robot instantly flies off to some distant location. Calling the reset function afterwards does not resolve the problem—once it’s launched away, the robot cannot be properly reset.

Context / Steps to Reproduce:

I have a trained model (using go2_eval.py) that was originally trained on a flat plane. In go2_env.py, I load a mesh generated by the terrain generator to create a more complex terrain. Immediately after launching the simulator in go2_eval.py, the robot gets flung away, and calling reset() does not fix the situation. Additional Details:

The training on a flat plane works without issues. The problem only arises when using the newly generated terrain mesh. Once the robot is “launched” upon spawn, it’s impossible to reset it properly. Any guidance or suggestions on how to prevent this unexpected behavior would be greatly appreciated. Thank you!

Launched Unitree Go2 Mesh I used

Steps to Reproduce

# I only modified go2_env.py to add complex terrain mesh.
import torch
import math
import genesis as gs
from genesis.utils.geom import quat_to_xyz, transform_by_quat, inv_quat, transform_quat_by_quat


def gs_rand_float(lower, upper, shape, device):
    return (upper - lower) * torch.rand(size=shape, device=device) + lower


class Go2Env:
    def __init__(self, num_envs, env_cfg, obs_cfg, reward_cfg, command_cfg, show_viewer=False, device="cuda"):
        self.device = torch.device(device)

        self.num_envs = num_envs
        self.num_obs = obs_cfg["num_obs"]
        self.num_privileged_obs = None
        self.num_actions = env_cfg["num_actions"]
        self.num_commands = command_cfg["num_commands"]

        self.simulate_action_latency = True  # there is a 1 step latency on real robot
        self.dt = 0.02  # control frequency on real robot is 50hz
        self.max_episode_length = math.ceil(env_cfg["episode_length_s"] / self.dt)

        self.env_cfg = env_cfg
        self.obs_cfg = obs_cfg
        self.reward_cfg = reward_cfg
        self.command_cfg = command_cfg

        self.obs_scales = obs_cfg["obs_scales"]
        self.reward_scales = reward_cfg["reward_scales"]

        # create scene
        self.scene = gs.Scene(
            sim_options=gs.options.SimOptions(dt=self.dt, substeps=2),
            viewer_options=gs.options.ViewerOptions(
                max_FPS=int(0.5 / self.dt),
                camera_pos=(2.0, 0.0, 2.5),
                camera_lookat=(0.0, 0.0, 0.5),
                camera_fov=40,
            ),
            vis_options=gs.options.VisOptions(n_rendered_envs=1),
            rigid_options=gs.options.RigidOptions(
                dt=self.dt,
                constraint_solver=gs.constraint_solver.Newton,
                enable_collision=True,
                enable_joint_limit=True,
            ),
            show_viewer=show_viewer,
        )

        # add plain
        self.scene.add_entity(gs.morphs.URDF(file="urdf/plane/plane.urdf", pos=(0,0,0), fixed=True))
        self.scene.add_entity(gs.morphs.Mesh(file="terrain-generator/results/generated_terrain/mesh_0/mesh.obj", pos=(0,0,-0.2), fixed=True, scale=1.0))

        # add robot
        self.base_init_pos = torch.tensor(self.env_cfg["base_init_pos"], device=self.device)
        self.base_init_quat = torch.tensor(self.env_cfg["base_init_quat"], device=self.device)
        self.inv_base_init_quat = inv_quat(self.base_init_quat)
        self.robot = self.scene.add_entity(
            gs.morphs.URDF(
                file="urdf/go2/urdf/go2.urdf",
                pos=self.base_init_pos.cpu().numpy(),
                quat=self.base_init_quat.cpu().numpy(),
            ),
        )

        # build
        self.scene.build(n_envs=num_envs)

        # names to indices
        self.motor_dofs = [self.robot.get_joint(name).dof_idx_local for name in self.env_cfg["dof_names"]]

        # PD control parameters
        self.robot.set_dofs_kp([self.env_cfg["kp"]] * self.num_actions, self.motor_dofs)
        self.robot.set_dofs_kv([self.env_cfg["kd"]] * self.num_actions, self.motor_dofs)

        # prepare reward functions and multiply reward scales by dt
        self.reward_functions, self.episode_sums = dict(), dict()
        for name in self.reward_scales.keys():
            self.reward_scales[name] *= self.dt
            self.reward_functions[name] = getattr(self, "_reward_" + name)
            self.episode_sums[name] = torch.zeros((self.num_envs,), device=self.device, dtype=gs.tc_float)

        # initialize buffers
        self.base_lin_vel = torch.zeros((self.num_envs, 3), device=self.device, dtype=gs.tc_float)
        self.base_ang_vel = torch.zeros((self.num_envs, 3), device=self.device, dtype=gs.tc_float)
        self.projected_gravity = torch.zeros((self.num_envs, 3), device=self.device, dtype=gs.tc_float)
        self.global_gravity = torch.tensor([0.0, 0.0, -1.0], device=self.device, dtype=gs.tc_float).repeat(
            self.num_envs, 1
        )
        self.obs_buf = torch.zeros((self.num_envs, self.num_obs), device=self.device, dtype=gs.tc_float)
        self.rew_buf = torch.zeros((self.num_envs,), device=self.device, dtype=gs.tc_float)
        self.reset_buf = torch.ones((self.num_envs,), device=self.device, dtype=gs.tc_int)
        self.episode_length_buf = torch.zeros((self.num_envs,), device=self.device, dtype=gs.tc_int)
        self.commands = torch.zeros((self.num_envs, self.num_commands), device=self.device, dtype=gs.tc_float)
        self.commands_scale = torch.tensor(
            [self.obs_scales["lin_vel"], self.obs_scales["lin_vel"], self.obs_scales["ang_vel"]],
            device=self.device,
            dtype=gs.tc_float,
        )
        self.actions = torch.zeros((self.num_envs, self.num_actions), device=self.device, dtype=gs.tc_float)
        self.last_actions = torch.zeros_like(self.actions)
        self.dof_pos = torch.zeros_like(self.actions)
        self.dof_vel = torch.zeros_like(self.actions)
        self.last_dof_vel = torch.zeros_like(self.actions)
        self.base_pos = torch.zeros((self.num_envs, 3), device=self.device, dtype=gs.tc_float)
        self.base_quat = torch.zeros((self.num_envs, 4), device=self.device, dtype=gs.tc_float)
        self.default_dof_pos = torch.tensor(
            [self.env_cfg["default_joint_angles"][name] for name in self.env_cfg["dof_names"]],
            device=self.device,
            dtype=gs.tc_float,
        )
        self.extras = dict()  # extra information for logging

    def _resample_commands(self, envs_idx):
        self.commands[envs_idx, 0] = gs_rand_float(*self.command_cfg["lin_vel_x_range"], (len(envs_idx),), self.device)
        self.commands[envs_idx, 1] = gs_rand_float(*self.command_cfg["lin_vel_y_range"], (len(envs_idx),), self.device)
        self.commands[envs_idx, 2] = gs_rand_float(*self.command_cfg["ang_vel_range"], (len(envs_idx),), self.device)

    def step(self, actions):
        self.actions = torch.clip(actions, -self.env_cfg["clip_actions"], self.env_cfg["clip_actions"])
        exec_actions = self.last_actions if self.simulate_action_latency else self.actions
        target_dof_pos = exec_actions * self.env_cfg["action_scale"] + self.default_dof_pos
        self.robot.control_dofs_position(target_dof_pos, self.motor_dofs)
        self.scene.step()

        # update buffers
        self.episode_length_buf += 1
        self.base_pos[:] = self.robot.get_pos()
        self.base_quat[:] = self.robot.get_quat()
        self.base_euler = quat_to_xyz(
            transform_quat_by_quat(torch.ones_like(self.base_quat) * self.inv_base_init_quat, self.base_quat)
        )
        inv_base_quat = inv_quat(self.base_quat)
        self.base_lin_vel[:] = transform_by_quat(self.robot.get_vel(), inv_base_quat)
        self.base_ang_vel[:] = transform_by_quat(self.robot.get_ang(), inv_base_quat)
        self.projected_gravity = transform_by_quat(self.global_gravity, inv_base_quat)
        self.dof_pos[:] = self.robot.get_dofs_position(self.motor_dofs)
        self.dof_vel[:] = self.robot.get_dofs_velocity(self.motor_dofs)

        # resample commands
        envs_idx = (
            (self.episode_length_buf % int(self.env_cfg["resampling_time_s"] / self.dt) == 0)
            .nonzero(as_tuple=False)
            .flatten()
        )
        self._resample_commands(envs_idx)

        # check termination and reset
        self.reset_buf = self.episode_length_buf > self.max_episode_length
        self.reset_buf |= torch.abs(self.base_euler[:, 1]) > self.env_cfg["termination_if_pitch_greater_than"]
        self.reset_buf |= torch.abs(self.base_euler[:, 0]) > self.env_cfg["termination_if_roll_greater_than"]

        time_out_idx = (self.episode_length_buf > self.max_episode_length).nonzero(as_tuple=False).flatten()
        self.extras["time_outs"] = torch.zeros_like(self.reset_buf, device=self.device, dtype=gs.tc_float)
        self.extras["time_outs"][time_out_idx] = 1.0

        self.reset_idx(self.reset_buf.nonzero(as_tuple=False).flatten())

        # compute reward
        self.rew_buf[:] = 0.0
        for name, reward_func in self.reward_functions.items():
            rew = reward_func() * self.reward_scales[name]
            self.rew_buf += rew
            self.episode_sums[name] += rew

        # compute observations
        self.obs_buf = torch.cat(
            [
                self.base_ang_vel * self.obs_scales["ang_vel"],  # 3
                self.projected_gravity,  # 3
                self.commands * self.commands_scale,  # 3
                (self.dof_pos - self.default_dof_pos) * self.obs_scales["dof_pos"],  # 12
                self.dof_vel * self.obs_scales["dof_vel"],  # 12
                self.actions,  # 12
            ],
            axis=-1,
        )

        self.last_actions[:] = self.actions[:]
        self.last_dof_vel[:] = self.dof_vel[:]

        return self.obs_buf, None, self.rew_buf, self.reset_buf, self.extras

    def get_observations(self):
        return self.obs_buf

    def get_privileged_observations(self):
        return None

    def reset_idx(self, envs_idx):
        if len(envs_idx) == 0:
            return

        # reset dofs
        self.dof_pos[envs_idx] = self.default_dof_pos
        self.dof_vel[envs_idx] = 0.0
        self.robot.set_dofs_position(
            position=self.dof_pos[envs_idx],
            dofs_idx_local=self.motor_dofs,
            zero_velocity=True,
            envs_idx=envs_idx,
        )

        # reset base
        self.base_pos[envs_idx] = self.base_init_pos
        self.base_quat[envs_idx] = self.base_init_quat.reshape(1, -1)
        self.robot.set_pos(self.base_pos[envs_idx], zero_velocity=False, envs_idx=envs_idx)
        self.robot.set_quat(self.base_quat[envs_idx], zero_velocity=False, envs_idx=envs_idx)
        self.base_lin_vel[envs_idx] = 0
        self.base_ang_vel[envs_idx] = 0
        self.robot.zero_all_dofs_velocity(envs_idx)

        # reset buffers
        self.last_actions[envs_idx] = 0.0
        self.last_dof_vel[envs_idx] = 0.0
        self.episode_length_buf[envs_idx] = 0
        self.reset_buf[envs_idx] = True
        print("reset")
        # fill extras
        self.extras["episode"] = {}
        for key in self.episode_sums.keys():
            self.extras["episode"]["rew_" + key] = (
                torch.mean(self.episode_sums[key][envs_idx]).item() / self.env_cfg["episode_length_s"]
            )
            self.episode_sums[key][envs_idx] = 0.0

        self._resample_commands(envs_idx)

    def reset(self):
        self.reset_buf[:] = True
        self.reset_idx(torch.arange(self.num_envs, device=self.device))
        return self.obs_buf, None

    # ------------ reward functions----------------
    def _reward_tracking_lin_vel(self):
        # Tracking of linear velocity commands (xy axes)
        lin_vel_error = torch.sum(torch.square(self.commands[:, :2] - self.base_lin_vel[:, :2]), dim=1)
        return torch.exp(-lin_vel_error / self.reward_cfg["tracking_sigma"])

    def _reward_tracking_ang_vel(self):
        # Tracking of angular velocity commands (yaw)
        ang_vel_error = torch.square(self.commands[:, 2] - self.base_ang_vel[:, 2])
        return torch.exp(-ang_vel_error / self.reward_cfg["tracking_sigma"])

    def _reward_lin_vel_z(self):
        # Penalize z axis base linear velocity
        return torch.square(self.base_lin_vel[:, 2])

    def _reward_action_rate(self):
        # Penalize changes in actions
        return torch.sum(torch.square(self.last_actions - self.actions), dim=1)

    def _reward_similar_to_default(self):
        # Penalize joint poses far away from default pose
        return torch.sum(torch.abs(self.dof_pos - self.default_dof_pos), dim=1)

    def _reward_base_height(self):
        # Penalize base height away from target
        return torch.square(self.base_pos[:, 2] - self.reward_cfg["base_height_target"])

Expected Behavior

Go2 starts walking on complex terrain without being launched (though it may stumble)

Screenshots/Videos

No response

Relevant log output

Environment

OS: Ubuntu 20.04
GPU/CPU: RTX3090
GPU-driver version: 550
CUDA / CUDA-toolkit version: 12.1

Release version or Commit ID

Genesis version: 0.2.1 Commit ID: 1f37930cef144b6220a738448330dbeb6fa9f2fa

Additional Context

No response

Jan 22 '25 06:01 Adwaver4157

Could you check if your initial robot location is a "free space", meaning no collision is happening?

Jan 22 '25 08:01 Kashu7100

Even when starting the robot from a plain surface instead of on the mesh, it suddenly stops near the mesh. Based on the normals, there should not be any obstacles.

https://github.com/user-attachments/assets/03c53d46-ae75-4bd7-8ffe-fe026d6ee279 https://github.com/user-attachments/assets/0bb63b27-fdbd-4519-b058-9634887259f2

Jan 23 '25 10:01 Adwaver4157

I think this is something to do with convexify option. Can you try to set convexify=False for your asset?

morph=gs.morphs.Mesh(
        ...
        convexify=False,
    ),

Jan 24 '25 13:01 Kashu7100

I have the same problem dropping my quadruped robot onto the provided terrain generator and am not using a mesh collider on my robot. Inevitably with higher number of envs (10k) at least one robot will give all NaN for state after colliding with the ground. What happens is the robot will collide with the ground and then shoot off into infinity. Happens with both Newton and CG. As you can imagine, Genesis is unusable with this problem since the whole simulation crashes.

Even if I replace my robot model with the go2 urdf I still get NaNs if the timestep is >=0.004. I also don't get the nan issue if I let me robot fall onto flat_terrain Terrain or a plane.

In the example above, one robot (index 2240) had the following .get_pos history. You can see near the end the values go crazy.

[-3.0772e+00, -6.0231e-01, -4.4113e-01],
        [-2.9892e+00, -3.4030e-01,  8.9290e-02],
        [-2.8068e+00,  1.2063e-01,  7.8904e-01],
        [-2.5196e+00,  6.2944e-02,  1.4499e+00],
        [ 2.3027e+00, -2.7093e+00, -7.3430e-01],
        [ 5.6343e+00,  1.5064e+00,  2.4574e+00],
        [-5.1093e+02,  2.2676e+01,  7.0654e+02]], device='cuda:0')

Jan 25 '25 19:01 Nate711

@Kashu7100 When using the terrain generator morphs.Terrain there doesn't seem to be a convexify option since you don't directly create it as a mesh?

Jan 25 '25 19:01 Nate711

@Nate711 Sorry for the delay, can you check your dt settings? (As pointed in #653 )

Feb 03 '25 23:02 Kashu7100

@Kashu7100 Thanks for the response! I tried decreasing dt to as low as 0.0004s but I still see robots fly away or have velocities that become abnormally high.

Please try this script to reproduce the issue: https://github.com/Nate711/Genesis-backflip/blob/main/flying_away_bug.py In this script I reset the position, orientation and dofs of any robots whose link velocities exceed 100m/s. However as you can see from the video below, many robots still fly off into space!

https://github.com/user-attachments/assets/34941475-88db-4bab-8d12-b54bde9ebcd1

Feb 17 '25 08:02 Nate711

I have the similar issue, though I initialize the robot at a relatively lower height. But the environment just returns nan values in base_pos, dof_pos, dof_vel and other robot states. I have provided the code to reproduce in https://github.com/lupinjia/genesis_lr/tree/self-imple.

My test results

For num_envs=10000, use_implicit_controller=False, dt=0.02, decimation=4

The training session terminates at ite 24. I implemented a rendered_envs_idx utility as https://github.com/Genesis-Embodied-AI/Genesis/pull/752 and rendered the abnormal env whose index is 9405. The result is as the following video, it just scratchs on the ground, trying to learn to walk, and suddenly the nan values come.

https://github.com/user-attachments/assets/616c1e99-cff9-4d29-80f3-4cdbe927ef45

For num_envs=8192, use_implicit_controller=False, dt=0.02, decimation=4

The nan iteration becomes 1532
For num_envs=10000, use_implicit_controller=False, dt=0.005, decimation=4

The nan iteration becomes 726

For num_envs=10000, use_implicit_controller=True, dt=0.02, decimation=4

The nan iteration becomes 749

Comments

Though the policy can learn to walk before nan values come out under some configuration, but it's unfavorable for those who want to continue training to get better robustness and performance if nan values come out suddenly.

Feb 23 '25 04:02 lupinjia

May this issue has been fixed by https://github.com/Genesis-Embodied-AI/Genesis/pull/1777. Must be confirmed.

Sep 27 '25 09:09 duburcqa

[Bug]: Unitree Go2 gets “launched away” when spawned on complex terrain generated by the terrain generator

Bug Description

Steps to Reproduce

Expected Behavior

Screenshots/Videos

Relevant log output

Environment

Release version or Commit ID

Additional Context

My test results

Comments