IsaacGymEnvs
IsaacGymEnvs copied to clipboard
Factory environment on remote server, Segmentation fault (core dumped)
Thanks for the great job on IsaacGymEnvs! When I ran the demo 'python train.py task=FactoryTaskNutBoltScrew', I met 'Segmentation fault (core dumped)', The output is as follows:
Importing module 'gym_37' (/home/slc/env/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_37.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/slc/env/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
Warning: Gym version v0.24.0 has a number of critical issues with `gym.make` such that the `reset` and `step` functions are called before returning the environment. It is recommend to downgrading to v0.23.1 or upgrading to v0.25.1
train.py:49: UserWarning:
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
@hydra.main(config_name="config", config_path="./cfg")
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config': Defaults list is missing `_self_`. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information
warnings.warn(msg, UserWarning)
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/defaults_list.py:415: UserWarning: In config: Invalid overriding of hydra/job_logging:
Default list overrides requires 'override' keyword.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/defaults_list_override for more information.
deprecation_warning(msg)
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/hydra/_internal/hydra.py:127: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
configure_logging=with_log_configuration,
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/torch/utils/cpp_extension.py:3: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
PyTorch version 1.8.1
Device count 2
/home/slc/env/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /home/slc/.cache/torch_extensions as PyTorch extensions root...
Emitting ninja build file /home/slc/.cache/torch_extensions/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
/home/slc/env/isaacgym/python/isaacgym/torch_utils.py:135: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def get_axis_params(value, axis_idx, x_value=0., dtype=np.float, n_dims=3):
2022-11-04 16:54:09,740 - INFO - logger - logger initialized
<unknown>:6: DeprecationWarning: invalid escape sequence \*
Error: FBX library failed to load - importing FBX data will not succeed. Message: No module named 'fbx'
FBX tools must be installed from https://help.autodesk.com/view/FBX/2020/ENU/?guid=FBX_Developer_Help_scripting_with_python_fbx_installing_python_fbx_html
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/torch/utils/tensorboard/__init__.py:3: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if not hasattr(tensorboard, '__version__') or LooseVersion(tensorboard.__version__) < LooseVersion('1.15'):
task:
name: FactoryTaskNutBoltScrew
physics_engine: physx
sim:
use_gpu_pipeline: True
up_axis: z
dt: 0.016667
gravity: [0.0, 0.0, -9.81]
disable_gravity: False
env:
numEnvs: 128
numObservations: 32
numActions: 12
randomize:
franka_arm_initial_dof_pos: [0.0015178, -0.19651, -0.0014364, -1.9761, -0.00027717, 1.7796, 0.78556]
nut_rot_initial: 30.0
rl:
pos_action_scale: [0.1, 0.1, 0.1]
rot_action_scale: [0.1, 0.1, 0.1]
force_action_scale: [1.0, 1.0, 1.0]
torque_action_scale: [1.0, 1.0, 1.0]
unidirectional_rot: True
unidirectional_force: False
clamp_rot: True
clamp_rot_thresh: 1e-06
add_obs_finger_force: False
keypoint_reward_scale: 1.0
action_penalty_scale: 0.0
max_episode_length: 4096
far_error_thresh: 0.1
success_bonus: 0.0
ctrl:
ctrl_type: operational_space_motion
all:
jacobian_type: geometric
gripper_prop_gains: [100, 100]
gripper_deriv_gains: [1, 1]
gym_default:
ik_method: dls
joint_prop_gains: [40, 40, 40, 40, 40, 40, 40]
joint_deriv_gains: [8, 8, 8, 8, 8, 8, 8]
gripper_prop_gains: [500, 500]
gripper_deriv_gains: [20, 20]
joint_space_ik:
ik_method: dls
joint_prop_gains: [1, 1, 1, 1, 1, 1, 1]
joint_deriv_gains: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
joint_space_id:
ik_method: dls
joint_prop_gains: [40, 40, 40, 40, 40, 40, 40]
joint_deriv_gains: [8, 8, 8, 8, 8, 8, 8]
task_space_impedance:
motion_ctrl_axes: [1, 1, 1, 1, 1, 1]
task_prop_gains: [40, 40, 40, 40, 40, 40]
task_deriv_gains: [8, 8, 8, 8, 8, 8]
operational_space_motion:
motion_ctrl_axes: [0, 0, 1, 0, 0, 1]
task_prop_gains: [1, 1, 1, 1, 1, 100]
task_deriv_gains: [1, 1, 1, 1, 1, 1]
open_loop_force:
force_ctrl_axes: [0, 0, 1, 0, 0, 0]
closed_loop_force:
force_ctrl_axes: [0, 0, 1, 0, 0, 0]
wrench_prop_gains: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
hybrid_force_motion:
motion_ctrl_axes: [1, 1, 0, 1, 1, 1]
task_prop_gains: [40, 40, 40, 40, 40, 40]
task_deriv_gains: [8, 8, 8, 8, 8, 8]
force_ctrl_axes: [0, 0, 1, 0, 0, 0]
wrench_prop_gains: [0.1, 0.1, 0.1, 0.1, 0.1, 0.1]
train:
params:
seed: 42
algo:
name: a2c_continuous
model:
name: continuous_a2c_logstd
network:
name: actor_critic
separate: False
space:
continuous:
mu_activation: None
sigma_activation: None
mu_init:
name: default
sigma_init:
name: const_initializer
val: 0
fixed_sigma: True
mlp:
units: [256, 128, 64]
activation: elu
d2rl: False
initializer:
name: default
regularizer:
name: None
load_checkpoint: True
load_path: /home/slc/env/IsaacGymEnvs-main/isaacgymenvs/runs/FactoryTaskNutBoltScrew/nn/FactoryTaskNutBoltScrew.pth
config:
name: FactoryTaskNutBoltScrew
full_experiment_name: FactoryTaskNutBoltScrew
env_name: rlgpu
multi_gpu: False
ppo: True
mixed_precision: True
normalize_input: True
normalize_value: True
value_bootstrap: True
num_actors: 128
reward_shaper:
scale_value: 1.0
normalize_advantage: True
gamma: 0.99
tau: 0.95
learning_rate: 0.0001
lr_schedule: fixed
schedule_type: standard
kl_threshold: 0.016
score_to_win: 20000
max_epochs: 1024
save_best_after: 50
save_frequency: 100
print_stats: True
grad_norm: 1.0
entropy_coef: 0.0
truncate_grads: False
e_clip: 0.2
horizon_length: 32
minibatch_size: 512
mini_epochs: 8
critic_coef: 2
clip_value: True
seq_len: 4
bounds_loss_coef: 0.0001
device: cuda:0
task_name: FactoryTaskNutBoltScrew
experiment:
num_envs:
seed: 42
torch_deterministic: False
max_iterations:
physics_engine: physx
pipeline: gpu
sim_device: cuda:0
rl_device: cuda:0
graphics_device_id: 0
num_threads: 4
solver_type: 1
num_subscenes: 4
test: False
checkpoint: /home/slc/env/IsaacGymEnvs-main/isaacgymenvs/runs/FactoryTaskNutBoltScrew/nn/FactoryTaskNutBoltScrew.pth
multi_gpu: False
wandb_activate: False
wandb_group:
wandb_name: FactoryTaskNutBoltScrew
wandb_entity:
wandb_project: isaacgymenvs
capture_video: False
capture_video_freq: 1464
capture_video_len: 100
force_render: True
headless: False
Setting seed: 42
self.seed = 42
Started to train
Exact experiment name requested from command line: FactoryTaskNutBoltScrew
/home/slc/miniconda3/envs/rlgpu/lib/python3.7/site-packages/gym/spaces/box.py:112: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
/home/slc/env/IsaacGymEnvs-main/isaacgymenvs/tasks/factory/factory_base.py:507: UserWarning: WARN: Please be patient: SDFs may be generating, which may take a few minutes. Terminating prematurely may result in a corrupted SDF cache.
logger.warn('Please be patient: SDFs may be generating, which may take a few minutes. Terminating prematurely may result in a corrupted SDF cache.')
Using SDF cache directory '/home/slc/.isaacgym/sdf_V100'
~!~!~! Loaded/Cooked SDF triangle mesh 0 @ 0x55b23f95ae80, resolution=256, spacing=0.000108
~!~! Bounds: (-0.012000, 0.012000) (-0.013856, 0.013856) (0.016000, 0.029000)
~!~! Extents: (0.024000, 0.027712, 0.013000)
~!~! Resolution: (222, 256, 121)
~!~!~! Loaded/Cooked SDF triangle mesh 1 @ 0x55b245831410, resolution=512, spacing=0.000080
~!~! Bounds: (-0.012000, 0.012000) (-0.012000, 0.012000) (0.000000, 0.041000)
~!~! Extents: (0.024000, 0.024000, 0.041000)
~!~! Resolution: (300, 300, 512)
~!~!~! Loaded/Cooked SDF triangle mesh 2 @ 0x55b23e473a10, resolution=256, spacing=0.000108
~!~! Bounds: (-0.012000, 0.012000) (-0.013856, 0.013856) (0.016000, 0.029000)
~!~! Extents: (0.024000, 0.027712, 0.013000)
~!~! Resolution: (222, 256, 121)
~!~!~! Loaded/Cooked SDF triangle mesh 3 @ 0x55b249372f10, resolution=512, spacing=0.000080
~!~! Bounds: (-0.012000, 0.012000) (-0.012000, 0.012000) (0.000000, 0.041000)
~!~! Extents: (0.024000, 0.024000, 0.041000)
~!~! Resolution: (300, 300, 512)
Box(-1.0, 1.0, (12,), float32) Box(-inf, inf, (32,), float32)
current training device: cuda:0
build mlp: 32
RunningMeanStd: (1,)
RunningMeanStd: (32,)
=> loading checkpoint '/home/slc/env/IsaacGymEnvs-main/isaacgymenvs/runs/FactoryTaskNutBoltScrew/nn/FactoryTaskNutBoltScrew.pth'
/home/slc/env/IsaacGymEnvs-main/isaacgymenvs/tasks/factory/factory_control.py:145: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
task_wrench = task_wrench + torch.tensor(cfg_ctrl['motion_ctrl_axes'], device=device).unsqueeze(0) * task_wrench_motion
Unhandled descriptor set 433
Unhandled descriptor set 1176522960
Unhandled descriptor set 1177337120
Segmentation fault (core dumped)
If I use the 'headless=True', the error will disappear. I am aware that the remote server needs the graphic display, so I use X11. And I used the method in other demos like Ant and ShadowHand, It seems to work well. so I am confused why the error will appear in the factory environment. I tried the method like #22 , updating the Nvidia driver, but it doesn't work, either.
Hi, I have met the same error, have you solved?
Hi, I have met the same error, have you solved?
not yet.