Genesis icon indicating copy to clipboard operation
Genesis copied to clipboard

[Bug]: `self.drone.get_quat()` return `nan` in hover env.

Open Nam-dada opened this issue 8 months ago • 7 comments

Bug Description

I am currently trying to use the algorithms from EvoX to train environments provided in Genesis. While attempting to train the hover_env, I found that after a certain number of iterations, the returned reward becomes nan.

To investigate this, I conducted local debugging and discovered that in the definition of hover_env.py, the return value of self.drone.get_quat() may become nan for some individuals. This subsequently causes an error in the computation of self.base_euler, which in turn affects the calculation of crash_condition, reset_buf, and _reward_yaw, eventually leading to incorrect behavior.

Steps to Reproduce

Since the code is still under development, I am currently unable to share all the scripts. However, I will list some of the PyTorch-related settings I am using below.

import torch
import genesis as gs
from hover_env import HoverEnv

torch.set_float32_matmul_precision("high")
torch.set_default_device("cuda" if torch.cuda.is_available() else "cpu")
seed = 1234
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

def get_cfgs():
    env_cfg = {
        "num_actions": 4,
        # termination
        "termination_if_roll_greater_than": 180,  # degree
        "termination_if_pitch_greater_than": 180,
        "termination_if_close_to_ground": 0.1,
        "termination_if_x_greater_than": 3.0,
        "termination_if_y_greater_than": 3.0,
        "termination_if_z_greater_than": 2.0,
        # base pose
        "base_init_pos": [0.0, 0.0, 1.0],
        "base_init_quat": [1.0, 0.0, 0.0, 0.0],
        "episode_length_s": 15.0,
        "at_target_threshold": 0.1,
        "resampling_time_s": 3.0,
        "simulate_action_latency": True,
        "clip_actions": 1.0,
        # visualization
        "visualize_target": False,
        "visualize_camera": False,
        "max_visualize_FPS": 60,
    }
    obs_cfg = {
        "num_obs": 17,
        "obs_scales": {
            "rel_pos": 1 / 3.0,
            "lin_vel": 1 / 3.0,
            "ang_vel": 1 / 3.14159,
        },
    }
    reward_cfg = {
        "yaw_lambda": -10.0,
        "reward_scales": {
            "target": 10.0,
            "smooth": -1e-4,
            "yaw": 0.01,
            "angular": -2e-4,
            "crash": -10.0,
        },
    }
    command_cfg = {
        "num_commands": 3,
        "pos_x_range": [-1.0, 1.0],
        "pos_y_range": [-1.0, 1.0],
        "pos_z_range": [1.0, 1.0],
    }

    return env_cfg, obs_cfg, reward_cfg, command_cfg
env_cfg, obs_cfg, reward_cfg, command_cfg = get_cfgs()

gs.init(backend=gs.gpu, precision="32", logging_level="error")
env = HoverEnv(
    num_envs=512,
    env_cfg=env_cfg,
    obs_cfg=obs_cfg,
    reward_cfg=reward_cfg,
    command_cfg=command_cfg,
    show_viewer=False,
)

Expected Behavior

The hover_env work well and self.drone.get_quat() will return the value without nan.

Screenshots/Videos

No response

Relevant log output

torch.where(torch.isnan(self.drone.get_quat()))
(tensor([135, 135, 13...='cuda:0'), tensor([0, 1, 2, 3],...='cuda:0'))
torch.where(torch.isnan(self.base_euler))
(tensor([135, 135, 13...='cuda:0'), tensor([0, 1, 2], de...='cuda:0'))

Environment

  • OS: Windows 10 IoT 企业版 LTSC
  • GPU/CPU RTX 4060 Ti, Intel I7-10700K
  • GPU-driver version 572.42
  • CUDA / CUDA-toolkit version 12.8

Release version or Commit ID

pip install genesis-world; genesis-world 0.2.1

Additional Context

It is worth mentioning that I tested my code on both RTX 2080 Ti and RTX 3090, and it ran without any issues — the returned reward values were valid. However, when running the same code on RTX 4060 Ti and RTX 4090, the bug occurred. I’m not sure if this information is helpful, but I hope it provides some insight.

Nam-dada avatar Mar 22 '25 01:03 Nam-dada

Due to the rapid updates in Genesis, I recommend using the latest version for testing. You can install it using the following commands:

git clone https://github.com/Genesis-Embodied-AI/Genesis.git
cd Genesis
pip install -e .

I also recommend running on Ubuntu for the best experience. Windows users may encounter compatibility issues with some simulation components, while Ubuntu provides the most stable environment for development and testing with Genesis.

KafuuChikai avatar Mar 26 '25 13:03 KafuuChikai

Hello,

Thank you for your prompt response and the code update. I have updated the versions of Genesis and the hover_env to commit 2d9dad444446125b6391c6810eb74d8d69259622, and I tested them on both my local Windows environment and WSL environment (on the same device). After testing, I found that the same code produces nan values after almost the same number of iterations.

I am planning to test whether varying the num_envs parameter will trigger the bug. Additionally, I intend to set up a new Ubuntu Docker container on a server equipped with an RTX 4090 to further test the code and see if the issue persists.

To be honest, I'm starting to feel a bit exhausted by this issue. I've spent more than half a month trying to identify the cause of the bug, but ultimately I haven't succeeded. Perhaps I should consider alternative solutions, such as replacing the nan values generated in the _reward_yaw function with specific values. Do you have any suggestions

Nam-dada avatar Apr 01 '25 09:04 Nam-dada

@Nam-dada I can try to help you on this.

duburcqa avatar Apr 01 '25 09:04 duburcqa

How long does it take before it starts returning nan ? Do you have a self-contained reproduction script ?

duburcqa avatar Apr 01 '25 09:04 duburcqa

Hello,

I uploaded a file all_actions.pkl on Google Drive to help you try to reproduce the bug. This file is from my own experiment. It’s important to note that NaN values start appearing from index 554 in this file. However, when I attempted to reproduce the bug locally, the reward value returned by the environment began showing NaN around index 500. The reproduction script is as below:

import torch
from hover_env import HoverEnv
import genesis as gs
import pickle
import os
torch.set_default_device("cuda" if torch.cuda.is_available() else "cpu")
seed = 1234
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.set_float32_matmul_precision("high")

def get_cfgs():
    env_cfg = {
        "num_actions": 4,
        # termination
        "termination_if_roll_greater_than": 180,  # degree
        "termination_if_pitch_greater_than": 180,
        "termination_if_close_to_ground": 0.1,
        "termination_if_x_greater_than": 3.0,
        "termination_if_y_greater_than": 3.0,
        "termination_if_z_greater_than": 2.0,
        # base pose
        "base_init_pos": [0.0, 0.0, 1.0],
        "base_init_quat": [1.0, 0.0, 0.0, 0.0],
        "episode_length_s": 15.0,
        "at_target_threshold": 0.1,
        "resampling_time_s": 3.0,
        "simulate_action_latency": True,
        "clip_actions": 1.0,
        # visualization
        "visualize_target": False,
        "visualize_camera": False,
        "max_visualize_FPS": 60,
    }
    obs_cfg = {
        "num_obs": 17,
        "obs_scales": {
            "rel_pos": 1 / 3.0,
            "lin_vel": 1 / 3.0,
            "ang_vel": 1 / 3.14159,
        },
    }
    reward_cfg = {
        "yaw_lambda": -10.0,
        "reward_scales": {
            "target": 10.0,
            "smooth": -1e-4,
            "yaw": 0.01,
            "angular": -2e-4,
            "crash": -10.0,
        },
    }
    command_cfg = {
        "num_commands": 3,
        "pos_x_range": [-1.0, 1.0],
        "pos_y_range": [-1.0, 1.0],
        "pos_z_range": [1.0, 1.0],
    }

    return env_cfg, obs_cfg, reward_cfg, command_cfg



device = "cuda" if torch.cuda.is_available() else "cpu"

save_dir = "your_file_path" # replace with your path.
file_path = os.path.join(save_dir, "all_actions.pkl")


with open(file_path, 'rb') as f:
    all_actions = pickle.load(f)
all_actions = [torch.tensor(action, device=device) for action in all_actions]
    
gs.init(backend=gs.gpu, precision="32", logging_level="error")
env_cfg, obs_cfg, reward_cfg, command_cfg = get_cfgs()
env = HoverEnv(
    num_envs=10000,
    env_cfg=env_cfg,
    obs_cfg=obs_cfg,
    reward_cfg=reward_cfg,
    command_cfg=command_cfg,
    show_viewer=False,
)

for ac in all_actions:
    obs, _, reward, _, _ = env.step(ac)
    print(torch.sum(torch.isnan(reward))) # To show that if there are nan in the reward

Nam-dada avatar Apr 01 '25 10:04 Nam-dada

Thank you ! I will have a look as soon as I have some bandwidth, hopefully tomorrow.

duburcqa avatar Apr 01 '25 17:04 duburcqa

I can reproduce the nan issue. I will have a look ASAP. Sorry for the delay.

duburcqa avatar Apr 10 '25 16:04 duburcqa

I had this issue with the hover env aswell, but also have found this issue while building a framework for drone training. (based on gym-pybullet-drones)

Drones seem to just "die", with their position and quat both just becoming NaN. Hopefully this can be fixed soon.

Edit: Looked into this further, it seems the drone position often goes to an unrealistically high set of values, then all drone info becomes NaN on the next step. Drone is being controlled by random unchanging set of RPMs.

lukeclausi avatar Jun 09 '25 17:06 lukeclausi

Have you tried on the latest genesis main branch ?

duburcqa avatar Jun 09 '25 18:06 duburcqa

Just reinstalled from main and having the same issue.

Might be tied to the drones acceleration. When the set RPMs are lower / the drone is less aggressive the issue does not occur (or occurs later).

lukeclausi avatar Jun 09 '25 18:06 lukeclausi

For the record, here is a simple standalone script that triggers the issue:

import torch

import genesis as gs

from hover_env import HoverEnv


gs.init(backend=gs.cpu, precision="32", logging_level="error", seed=0)

env = HoverEnv(
    env_cfg={
        "num_actions": 4,
        # termination
        "termination_if_roll_greater_than": 180,  # degree
        "termination_if_pitch_greater_than": 180,
        "termination_if_close_to_ground": 0.1,
        "termination_if_x_greater_than": 3.0,
        "termination_if_y_greater_than": 3.0,
        "termination_if_z_greater_than": 2.0,
        # base pose
        "base_init_pos": [0.0, 0.0, 1.0],
        "base_init_quat": [1.0, 0.0, 0.0, 0.0],
        "episode_length_s": 15.0,
        "at_target_threshold": 0.1,
        "resampling_time_s": 3.0,
        "simulate_action_latency": True,
        "clip_actions": 1.0,
        # visualization
        "visualize_target": False,
        "visualize_camera": False,
        "max_visualize_FPS": 60,
    },
    obs_cfg={
        "num_obs": 17,
        "obs_scales": {
            "rel_pos": 1 / 3.0,
            "lin_vel": 1 / 3.0,
            "ang_vel": 1 / 3.14159,
        },
    },
    reward_cfg={
        "yaw_lambda": -10.0,
        "reward_scales": {
            "target": 10.0,
            "smooth": -1e-4,
            "yaw": 0.01,
            "angular": -2e-4,
            "crash": -10.0,
        },
    },
    command_cfg={
        "num_commands": 3,
        "pos_x_range": [-1.0, 1.0],
        "pos_y_range": [-1.0, 1.0],
        "pos_z_range": [1.0, 1.0],
    },
    show_viewer=False,
    num_envs=1,
)

action = torch.tensor([[-0.5, 0.77, -1.0, 0.99]], dtype=gs.tc_float, device=gs.device)

i = 0
iter_exploding = -1
env.reset()
while True:
    env.step(action)
    num_constraints = env.scene.rigid_solver.constraint_solver.n_constraints.to_torch().item()
    acc = env.scene.rigid_solver.dofs_state.acc.to_torch()[:, 0]
    print(f"[{i}] num constraints: {num_constraints}")
    print(f"[{i}] acc: {acc}")
    i += 1

    if iter_exploding != -1:
        if torch.isnan(acc).any():
            break
    else:
        if (torch.abs(acc) > 1e5).any():
            iter_exploding = i

I will try to find time to investigate. The cartesian acceleration seems to explode at some point, while the angular acceleration is very large but the simulation can cope with it apparently.

[93] num constraints: 0
[93] acc: tensor([-9.8023e+00, -3.9377e+00,  9.5481e-01, -1.6567e+09, -9.1273e+09,
        -3.2211e+05])
[94] num constraints: 0
[94] acc: tensor([-9.5669e+00,  1.3682e+01, -3.6455e+00,  5.2567e+12,  1.9517e+11,
         2.7756e+07])
[95] num constraints: 0
[95] acc: tensor([ 3.4887e+11, -2.6254e+12,  1.4832e+12,  1.7900e+23,  5.8323e+21,
         1.6306e+19])
[96] num constraints: 0
[96] acc: tensor([nan, nan, nan, nan, nan, nan])

It may be worth adding some very small damping on the floating base to emulated energy dissipation with the air.

duburcqa avatar Jun 23 '25 09:06 duburcqa

The problem is because the angular velocity of the drone becoming too big. Normally, the physical drone doesn't not rotate say faster than 20 rad/s. The easiest solution is to add damping to the drone

Solution:

# in HoverEnv class,  
# right after self.scene.build(n_envs=num_envs)

# add the following line
self.drone.set_dofs_damping(
    torch.tensor([0.0, 0.0, 0.0, 1e-4, 1e-4, 1e-4])
)  # Set damping to a small value to avoid numerical instability

Additionally, you can also combine damping with angular velocity clipping since the data collected beyond certain rotation speed is useless for RL training.

yun-long avatar Jul 03 '25 00:07 yun-long

There is still something strange happening here. The Cartesian acceleration is exploding in a single step. This should not happen.

duburcqa avatar Jul 03 '25 06:07 duburcqa