Genesis
Genesis copied to clipboard
[Bug]: `self.drone.get_quat()` return `nan` in hover env.
Bug Description
I am currently trying to use the algorithms from EvoX to train environments provided in Genesis. While attempting to train the hover_env, I found that after a certain number of iterations, the returned reward becomes nan.
To investigate this, I conducted local debugging and discovered that in the definition of hover_env.py, the return value of self.drone.get_quat() may become nan for some individuals. This subsequently causes an error in the computation of self.base_euler, which in turn affects the calculation of crash_condition, reset_buf, and _reward_yaw, eventually leading to incorrect behavior.
Steps to Reproduce
Since the code is still under development, I am currently unable to share all the scripts. However, I will list some of the PyTorch-related settings I am using below.
import torch
import genesis as gs
from hover_env import HoverEnv
torch.set_float32_matmul_precision("high")
torch.set_default_device("cuda" if torch.cuda.is_available() else "cpu")
seed = 1234
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
def get_cfgs():
env_cfg = {
"num_actions": 4,
# termination
"termination_if_roll_greater_than": 180, # degree
"termination_if_pitch_greater_than": 180,
"termination_if_close_to_ground": 0.1,
"termination_if_x_greater_than": 3.0,
"termination_if_y_greater_than": 3.0,
"termination_if_z_greater_than": 2.0,
# base pose
"base_init_pos": [0.0, 0.0, 1.0],
"base_init_quat": [1.0, 0.0, 0.0, 0.0],
"episode_length_s": 15.0,
"at_target_threshold": 0.1,
"resampling_time_s": 3.0,
"simulate_action_latency": True,
"clip_actions": 1.0,
# visualization
"visualize_target": False,
"visualize_camera": False,
"max_visualize_FPS": 60,
}
obs_cfg = {
"num_obs": 17,
"obs_scales": {
"rel_pos": 1 / 3.0,
"lin_vel": 1 / 3.0,
"ang_vel": 1 / 3.14159,
},
}
reward_cfg = {
"yaw_lambda": -10.0,
"reward_scales": {
"target": 10.0,
"smooth": -1e-4,
"yaw": 0.01,
"angular": -2e-4,
"crash": -10.0,
},
}
command_cfg = {
"num_commands": 3,
"pos_x_range": [-1.0, 1.0],
"pos_y_range": [-1.0, 1.0],
"pos_z_range": [1.0, 1.0],
}
return env_cfg, obs_cfg, reward_cfg, command_cfg
env_cfg, obs_cfg, reward_cfg, command_cfg = get_cfgs()
gs.init(backend=gs.gpu, precision="32", logging_level="error")
env = HoverEnv(
num_envs=512,
env_cfg=env_cfg,
obs_cfg=obs_cfg,
reward_cfg=reward_cfg,
command_cfg=command_cfg,
show_viewer=False,
)
Expected Behavior
The hover_env work well and self.drone.get_quat() will return the value without nan.
Screenshots/Videos
No response
Relevant log output
torch.where(torch.isnan(self.drone.get_quat()))
(tensor([135, 135, 13...='cuda:0'), tensor([0, 1, 2, 3],...='cuda:0'))
torch.where(torch.isnan(self.base_euler))
(tensor([135, 135, 13...='cuda:0'), tensor([0, 1, 2], de...='cuda:0'))
Environment
- OS: Windows 10 IoT 企业版 LTSC
- GPU/CPU RTX 4060 Ti, Intel I7-10700K
- GPU-driver version 572.42
- CUDA / CUDA-toolkit version 12.8
Release version or Commit ID
pip install genesis-world; genesis-world 0.2.1
Additional Context
It is worth mentioning that I tested my code on both RTX 2080 Ti and RTX 3090, and it ran without any issues — the returned reward values were valid. However, when running the same code on RTX 4060 Ti and RTX 4090, the bug occurred. I’m not sure if this information is helpful, but I hope it provides some insight.
Due to the rapid updates in Genesis, I recommend using the latest version for testing. You can install it using the following commands:
git clone https://github.com/Genesis-Embodied-AI/Genesis.git
cd Genesis
pip install -e .
I also recommend running on Ubuntu for the best experience. Windows users may encounter compatibility issues with some simulation components, while Ubuntu provides the most stable environment for development and testing with Genesis.
Hello,
Thank you for your prompt response and the code update. I have updated the versions of Genesis and the hover_env to commit 2d9dad444446125b6391c6810eb74d8d69259622, and I tested them on both my local Windows environment and WSL environment (on the same device). After testing, I found that the same code produces nan values after almost the same number of iterations.
I am planning to test whether varying the num_envs parameter will trigger the bug. Additionally, I intend to set up a new Ubuntu Docker container on a server equipped with an RTX 4090 to further test the code and see if the issue persists.
To be honest, I'm starting to feel a bit exhausted by this issue. I've spent more than half a month trying to identify the cause of the bug, but ultimately I haven't succeeded. Perhaps I should consider alternative solutions, such as replacing the nan values generated in the _reward_yaw function with specific values. Do you have any suggestions
@Nam-dada I can try to help you on this.
How long does it take before it starts returning nan ? Do you have a self-contained reproduction script ?
Hello,
I uploaded a file all_actions.pkl on Google Drive to help you try to reproduce the bug. This file is from my own experiment. It’s important to note that NaN values start appearing from index 554 in this file. However, when I attempted to reproduce the bug locally, the reward value returned by the environment began showing NaN around index 500. The reproduction script is as below:
import torch
from hover_env import HoverEnv
import genesis as gs
import pickle
import os
torch.set_default_device("cuda" if torch.cuda.is_available() else "cpu")
seed = 1234
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.set_float32_matmul_precision("high")
def get_cfgs():
env_cfg = {
"num_actions": 4,
# termination
"termination_if_roll_greater_than": 180, # degree
"termination_if_pitch_greater_than": 180,
"termination_if_close_to_ground": 0.1,
"termination_if_x_greater_than": 3.0,
"termination_if_y_greater_than": 3.0,
"termination_if_z_greater_than": 2.0,
# base pose
"base_init_pos": [0.0, 0.0, 1.0],
"base_init_quat": [1.0, 0.0, 0.0, 0.0],
"episode_length_s": 15.0,
"at_target_threshold": 0.1,
"resampling_time_s": 3.0,
"simulate_action_latency": True,
"clip_actions": 1.0,
# visualization
"visualize_target": False,
"visualize_camera": False,
"max_visualize_FPS": 60,
}
obs_cfg = {
"num_obs": 17,
"obs_scales": {
"rel_pos": 1 / 3.0,
"lin_vel": 1 / 3.0,
"ang_vel": 1 / 3.14159,
},
}
reward_cfg = {
"yaw_lambda": -10.0,
"reward_scales": {
"target": 10.0,
"smooth": -1e-4,
"yaw": 0.01,
"angular": -2e-4,
"crash": -10.0,
},
}
command_cfg = {
"num_commands": 3,
"pos_x_range": [-1.0, 1.0],
"pos_y_range": [-1.0, 1.0],
"pos_z_range": [1.0, 1.0],
}
return env_cfg, obs_cfg, reward_cfg, command_cfg
device = "cuda" if torch.cuda.is_available() else "cpu"
save_dir = "your_file_path" # replace with your path.
file_path = os.path.join(save_dir, "all_actions.pkl")
with open(file_path, 'rb') as f:
all_actions = pickle.load(f)
all_actions = [torch.tensor(action, device=device) for action in all_actions]
gs.init(backend=gs.gpu, precision="32", logging_level="error")
env_cfg, obs_cfg, reward_cfg, command_cfg = get_cfgs()
env = HoverEnv(
num_envs=10000,
env_cfg=env_cfg,
obs_cfg=obs_cfg,
reward_cfg=reward_cfg,
command_cfg=command_cfg,
show_viewer=False,
)
for ac in all_actions:
obs, _, reward, _, _ = env.step(ac)
print(torch.sum(torch.isnan(reward))) # To show that if there are nan in the reward
Thank you ! I will have a look as soon as I have some bandwidth, hopefully tomorrow.
I can reproduce the nan issue. I will have a look ASAP. Sorry for the delay.
I had this issue with the hover env aswell, but also have found this issue while building a framework for drone training. (based on gym-pybullet-drones)
Drones seem to just "die", with their position and quat both just becoming NaN. Hopefully this can be fixed soon.
Edit: Looked into this further, it seems the drone position often goes to an unrealistically high set of values, then all drone info becomes NaN on the next step. Drone is being controlled by random unchanging set of RPMs.
Have you tried on the latest genesis main branch ?
Just reinstalled from main and having the same issue.
Might be tied to the drones acceleration. When the set RPMs are lower / the drone is less aggressive the issue does not occur (or occurs later).
For the record, here is a simple standalone script that triggers the issue:
import torch
import genesis as gs
from hover_env import HoverEnv
gs.init(backend=gs.cpu, precision="32", logging_level="error", seed=0)
env = HoverEnv(
env_cfg={
"num_actions": 4,
# termination
"termination_if_roll_greater_than": 180, # degree
"termination_if_pitch_greater_than": 180,
"termination_if_close_to_ground": 0.1,
"termination_if_x_greater_than": 3.0,
"termination_if_y_greater_than": 3.0,
"termination_if_z_greater_than": 2.0,
# base pose
"base_init_pos": [0.0, 0.0, 1.0],
"base_init_quat": [1.0, 0.0, 0.0, 0.0],
"episode_length_s": 15.0,
"at_target_threshold": 0.1,
"resampling_time_s": 3.0,
"simulate_action_latency": True,
"clip_actions": 1.0,
# visualization
"visualize_target": False,
"visualize_camera": False,
"max_visualize_FPS": 60,
},
obs_cfg={
"num_obs": 17,
"obs_scales": {
"rel_pos": 1 / 3.0,
"lin_vel": 1 / 3.0,
"ang_vel": 1 / 3.14159,
},
},
reward_cfg={
"yaw_lambda": -10.0,
"reward_scales": {
"target": 10.0,
"smooth": -1e-4,
"yaw": 0.01,
"angular": -2e-4,
"crash": -10.0,
},
},
command_cfg={
"num_commands": 3,
"pos_x_range": [-1.0, 1.0],
"pos_y_range": [-1.0, 1.0],
"pos_z_range": [1.0, 1.0],
},
show_viewer=False,
num_envs=1,
)
action = torch.tensor([[-0.5, 0.77, -1.0, 0.99]], dtype=gs.tc_float, device=gs.device)
i = 0
iter_exploding = -1
env.reset()
while True:
env.step(action)
num_constraints = env.scene.rigid_solver.constraint_solver.n_constraints.to_torch().item()
acc = env.scene.rigid_solver.dofs_state.acc.to_torch()[:, 0]
print(f"[{i}] num constraints: {num_constraints}")
print(f"[{i}] acc: {acc}")
i += 1
if iter_exploding != -1:
if torch.isnan(acc).any():
break
else:
if (torch.abs(acc) > 1e5).any():
iter_exploding = i
I will try to find time to investigate. The cartesian acceleration seems to explode at some point, while the angular acceleration is very large but the simulation can cope with it apparently.
[93] num constraints: 0
[93] acc: tensor([-9.8023e+00, -3.9377e+00, 9.5481e-01, -1.6567e+09, -9.1273e+09,
-3.2211e+05])
[94] num constraints: 0
[94] acc: tensor([-9.5669e+00, 1.3682e+01, -3.6455e+00, 5.2567e+12, 1.9517e+11,
2.7756e+07])
[95] num constraints: 0
[95] acc: tensor([ 3.4887e+11, -2.6254e+12, 1.4832e+12, 1.7900e+23, 5.8323e+21,
1.6306e+19])
[96] num constraints: 0
[96] acc: tensor([nan, nan, nan, nan, nan, nan])
It may be worth adding some very small damping on the floating base to emulated energy dissipation with the air.
The problem is because the angular velocity of the drone becoming too big. Normally, the physical drone doesn't not rotate say faster than 20 rad/s. The easiest solution is to add damping to the drone
Solution:
# in HoverEnv class,
# right after self.scene.build(n_envs=num_envs)
# add the following line
self.drone.set_dofs_damping(
torch.tensor([0.0, 0.0, 0.0, 1e-4, 1e-4, 1e-4])
) # Set damping to a small value to avoid numerical instability
Additionally, you can also combine damping with angular velocity clipping since the data collected beyond certain rotation speed is useless for RL training.
There is still something strange happening here. The Cartesian acceleration is exploding in a single step. This should not happen.