robustnav icon indicating copy to clipboard operation
robustnav copied to clipboard

cannot run the PointNav expriments on a remote server(no screen)

Open danelpeng opened this issue 1 year ago • 0 comments

Hi, when i ran the code according to the README on a remote server(no screen), some issures arose. First, i run python scripts/startx.py. Then, i run bash train_navigation_agents.sh. While, it seems like the training didn't start, because the output info looks like it's stuck.

The output info is (NUM_PROCESSES = 6):

10/09 16:55:56 INFO: Running with args Namespace(approx_ckpt_steps_count=None, checkpoint=None, color_jitter=False, const_rotate=False, const_translate=False, deterministic_agents=False, deterministic_cudnn=False, disable_config_saving=False, disable_tensorboard=False, drift=False, drift_degrees=1.15, dyn_corr_mode=False, experiment='pointnav_robothor_vanilla_rgb_resnet_ddppo', experiment_base='projects/robustnav_baselines/experiments/robustnav_train', extra_tag='rnav_pointnav_vanilla_rgb_resnet_ddppo_clean', gp=None, log_level='info', max_sampler_processes_per_worker=None, motor_failure=False, output_dir='storage/robothor-pointnav-rgb-resnetgru-ddppo', random_crop=False, random_shift=False, restart_pipeline=False, seed=12345, skip_checkpoints=0, stoch_rotate=False, stoch_translate=False, test_dataset=None, test_date=None, test_gpus=None, training_dataset=None, validation_dataset=None, visual_corruption=None, visual_severity=0)	[main.py: 469]
10/09 16:55:57 INFO: [5]	[system.py: 128]
10/09 16:55:57 INFO: Sensor UUID is  rgb_lowres	[system.py: 128]
10/09 16:55:57 INFO: Applied corruptions are	[system.py: 128]
10/09 16:55:57 INFO: None	[system.py: 128]
10/09 16:55:57 INFO: 0	[system.py: 128]
10/09 16:55:57 INFO: Random Crop state  False	[system.py: 128]
10/09 16:55:57 INFO: Color Jitter state  False	[system.py: 128]
10/09 16:55:57 INFO: Random Translate  False	[system.py: 128]
10/09 16:55:57 INFO: Git diff saved to storage/robothor-pointnav-rgb-resnetgru-ddppo/used_configs/Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO/rnav_pointnav_vanilla_rgb_resnet_ddppo_clean/2023-10-09_16-55-57	[runner.py: 479]
10/09 16:55:57 INFO: Config files saved to storage/robothor-pointnav-rgb-resnetgru-ddppo/used_configs/Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO/rnav_pointnav_vanilla_rgb_resnet_ddppo_clean/2023-10-09_16-55-57	[runner.py: 516]
10/09 16:55:57 INFO: Using 6 train workers on devices (device(type='cuda', index=0), device(type='cuda', index=1), device(type='cuda', index=2), device(type='cuda', index=3), device(type='cuda', index=4), device(type='cuda', index=5))	[runner.py: 154]
10/09 16:56:00 INFO: train 0 args {'experiment_name': 'Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO_rnav_pointnav_vanilla_rgb_resnet_ddppo_clean', 'config': <projects.robustnav_baselines.experiments.robustnav_train.pointnav_robothor_vanilla_rgb_resnet_ddppo.PointNavS2SRGBResNetDDPPO object at 0x7fd4f24d4190>, 'results_queue': <multiprocessing.queues.Queue object at 0x7fd3a32ebf40>, 'checkpoints_queue': <multiprocessing.queues.Queue object at 0x7fd3a3288bb0>, 'checkpoints_dir': 'storage/robothor-pointnav-rgb-resnetgru-ddppo/checkpoints/Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO/rnav_pointnav_vanilla_rgb_resnet_ddppo_clean/2023-10-09_16-55-57', 'seed': 12345, 'deterministic_cudnn': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd3a3293430>, 'num_workers': 6, 'device': device(type='cuda', index=0), 'distributed_port': 51385, 'max_sampler_processes_per_worker': None, 'initial_model_state_dict': '[SUPRESSED]', 'mode': 'train', 'worker_id': 0}	[runner.py: 208]
10/09 16:56:03 INFO: train 1 args {'experiment_name': 'Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO_rnav_pointnav_vanilla_rgb_resnet_ddppo_clean', 'config': <projects.robustnav_baselines.experiments.robustnav_train.pointnav_robothor_vanilla_rgb_resnet_ddppo.PointNavS2SRGBResNetDDPPO object at 0x7fd4f24d4190>, 'results_queue': <multiprocessing.queues.Queue object at 0x7fd367ac2f40>, 'checkpoints_queue': <multiprocessing.queues.Queue object at 0x7fd367a5fbb0>, 'checkpoints_dir': 'storage/robothor-pointnav-rgb-resnetgru-ddppo/checkpoints/Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO/rnav_pointnav_vanilla_rgb_resnet_ddppo_clean/2023-10-09_16-55-57', 'seed': 12345, 'deterministic_cudnn': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd367a6a430>, 'num_workers': 6, 'device': device(type='cuda', index=1), 'distributed_port': 51385, 'max_sampler_processes_per_worker': None, 'initial_model_state_dict': '[SUPRESSED]', 'mode': 'train', 'worker_id': 1}	[runner.py: 208]
10/09 16:56:06 INFO: train 2 args {'experiment_name': 'Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO_rnav_pointnav_vanilla_rgb_resnet_ddppo_clean', 'config': <projects.robustnav_baselines.experiments.robustnav_train.pointnav_robothor_vanilla_rgb_resnet_ddppo.PointNavS2SRGBResNetDDPPO object at 0x7fd4f24d4190>, 'results_queue': <multiprocessing.queues.Queue object at 0x7fd3b32ebf40>, 'checkpoints_queue': <multiprocessing.queues.Queue object at 0x7fd3b3288bb0>, 'checkpoints_dir': 'storage/robothor-pointnav-rgb-resnetgru-ddppo/checkpoints/Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO/rnav_pointnav_vanilla_rgb_resnet_ddppo_clean/2023-10-09_16-55-57', 'seed': 12345, 'deterministic_cudnn': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd3b3293430>, 'num_workers': 6, 'device': device(type='cuda', index=2), 'distributed_port': 51385, 'max_sampler_processes_per_worker': None, 'initial_model_state_dict': '[SUPRESSED]', 'mode': 'train', 'worker_id': 2}	[runner.py: 208]
10/09 16:56:09 INFO: train 3 args {'experiment_name': 'Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO_rnav_pointnav_vanilla_rgb_resnet_ddppo_clean', 'config': <projects.robustnav_baselines.experiments.robustnav_train.pointnav_robothor_vanilla_rgb_resnet_ddppo.PointNavS2SRGBResNetDDPPO object at 0x7fd4f24d4190>, 'results_queue': <multiprocessing.queues.Queue object at 0x7fd3895faf40>, 'checkpoints_queue': <multiprocessing.queues.Queue object at 0x7fd389597bb0>, 'checkpoints_dir': 'storage/robothor-pointnav-rgb-resnetgru-ddppo/checkpoints/Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO/rnav_pointnav_vanilla_rgb_resnet_ddppo_clean/2023-10-09_16-55-57', 'seed': 12345, 'deterministic_cudnn': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd3895a2430>, 'num_workers': 6, 'device': device(type='cuda', index=3), 'distributed_port': 51385, 'max_sampler_processes_per_worker': None, 'initial_model_state_dict': '[SUPRESSED]', 'mode': 'train', 'worker_id': 3}	[runner.py: 208]
10/09 16:56:12 INFO: train 4 args {'experiment_name': 'Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO_rnav_pointnav_vanilla_rgb_resnet_ddppo_clean', 'config': <projects.robustnav_baselines.experiments.robustnav_train.pointnav_robothor_vanilla_rgb_resnet_ddppo.PointNavS2SRGBResNetDDPPO object at 0x7fd4f24d4190>, 'results_queue': <multiprocessing.queues.Queue object at 0x7fd380544f40>, 'checkpoints_queue': <multiprocessing.queues.Queue object at 0x7fd3804e0bb0>, 'checkpoints_dir': 'storage/robothor-pointnav-rgb-resnetgru-ddppo/checkpoints/Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO/rnav_pointnav_vanilla_rgb_resnet_ddppo_clean/2023-10-09_16-55-57', 'seed': 12345, 'deterministic_cudnn': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd3804ec430>, 'num_workers': 6, 'device': device(type='cuda', index=4), 'distributed_port': 51385, 'max_sampler_processes_per_worker': None, 'initial_model_state_dict': '[SUPRESSED]', 'mode': 'train', 'worker_id': 4}	[runner.py: 208]
10/09 16:56:16 INFO: Started 6 train processes	[runner.py: 294]
10/09 16:56:16 INFO: Using 1 valid workers on devices (device(type='cuda', index=5),)	[runner.py: 154]
10/09 16:56:16 INFO: train 5 args {'experiment_name': 'Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO_rnav_pointnav_vanilla_rgb_resnet_ddppo_clean', 'config': <projects.robustnav_baselines.experiments.robustnav_train.pointnav_robothor_vanilla_rgb_resnet_ddppo.PointNavS2SRGBResNetDDPPO object at 0x7fd4f24d5190>, 'results_queue': <multiprocessing.queues.Queue object at 0x7fd3ab2e4f40>, 'checkpoints_queue': <multiprocessing.queues.Queue object at 0x7fd3ab281bb0>, 'checkpoints_dir': 'storage/robothor-pointnav-rgb-resnetgru-ddppo/checkpoints/Pointnav-RoboTHOR-Vanilla-RGB-ResNet-DDPPO/rnav_pointnav_vanilla_rgb_resnet_ddppo_clean/2023-10-09_16-55-57', 'seed': 12345, 'deterministic_cudnn': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd3ab28c430>, 'num_workers': 6, 'device': device(type='cuda', index=5), 'distributed_port': 51385, 'max_sampler_processes_per_worker': None, 'initial_model_state_dict': '[SUPRESSED]', 'mode': 'train', 'worker_id': 5}	[runner.py: 208]
10/09 16:56:19 INFO: Started 1 valid processes	[runner.py: 320]
10/09 16:56:19 INFO: valid 0 args {'config': <projects.robustnav_baselines.experiments.robustnav_train.pointnav_robothor_vanilla_rgb_resnet_ddppo.PointNavS2SRGBResNetDDPPO object at 0x7fd4f24d4190>, 'results_queue': <multiprocessing.queues.Queue object at 0x7fd3bbd8df70>, 'checkpoints_queue': <multiprocessing.queues.Queue object at 0x7fd3bbd29be0>, 'seed': 12345, 'deterministic_cudnn': False, 'deterministic_agents': False, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd3bbd33460>, 'device': device(type='cuda', index=5), 'max_sampler_processes_per_worker': None, 'mode': 'valid', 'worker_id': 0}	[runner.py: 223]
10/09 16:56:25 INFO: Actor critic model has no attribute inverse mode	[system.py: 128]
10/09 16:56:25 INFO: Actor critic model has no attribute rotation mode	[system.py: 128]
10/09 16:56:25 INFO: Actor critic model has no attribute inverse mode	[system.py: 128]
10/09 16:56:25 INFO: Actor critic model has no attribute rotation mode	[system.py: 128]
10/09 16:56:25 INFO: Actor critic model has no attribute inverse mode	[system.py: 128]
10/09 16:56:25 INFO: Actor critic model has no attribute rotation mode	[system.py: 128]
10/09 16:56:25 INFO: Actor critic model has no attribute inverse mode	[system.py: 128]
10/09 16:56:25 INFO: Actor critic model has no attribute rotation mode	[system.py: 128]
10/09 16:56:25 INFO: Actor critic model has no attribute inverse mode	[system.py: 128]
10/09 16:56:25 INFO: Actor critic model has no attribute rotation mode	[system.py: 128]
10/09 16:56:25 INFO: Actor critic model has no attribute inverse mode	[system.py: 128]
10/09 16:56:25 INFO: Starting 0-th VectorSampledTask worker with args [{'scenes': ['FloorPlan_Train6_1', 'FloorPlan_Train11_5', 'FloorPlan_Train3_5', 'FloorPlan_Train7_1', 'FloorPlan_Train8_1', 'FloorPlan_Train10_3', 'FloorPlan_Train4_1', 'FloorPlan_Train12_4', 'FloorPlan_Train10_5', 'FloorPlan_Train3_2'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7fd4f24d4280>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7fd3b32ebdc0>], 'action_space': Discrete(4), 'seed': 1282648386, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.2'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd3b3293430>}]	[vector_sampled_tasks.py: 344]
10/09 16:56:25 INFO: Actor critic model has no attribute rotation mode	[system.py: 128]
10/09 16:56:25 INFO: Starting 0-th VectorSampledTask worker with args [{'scenes': ['FloorPlan_Train10_1', 'FloorPlan_Train2_3', 'FloorPlan_Train2_2', 'FloorPlan_Train5_1', 'FloorPlan_Train8_4', 'FloorPlan_Train7_3', 'FloorPlan_Train3_1', 'FloorPlan_Train12_5', 'FloorPlan_Train7_5', 'FloorPlan_Train7_4'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7fd4f24d4280>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7fd367ac2dc0>], 'action_space': Discrete(4), 'seed': 43676229, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.1'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd367a6a430>}]	[vector_sampled_tasks.py: 344]
10/09 16:56:25 INFO: Starting 0-th VectorSampledTask worker with args [{'scenes': ['FloorPlan_Train8_5', 'FloorPlan_Train12_2', 'FloorPlan_Train9_1', 'FloorPlan_Train11_1', 'FloorPlan_Train4_2', 'FloorPlan_Train5_5', 'FloorPlan_Train1_2', 'FloorPlan_Train9_2', 'FloorPlan_Train1_3', 'FloorPlan_Train1_1'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7fd4f24d4280>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7fd3a32ebdc0>], 'action_space': Discrete(4), 'seed': 1789368711, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.0'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd3a3293430>}]	[vector_sampled_tasks.py: 344]
10/09 16:56:25 INFO: Starting 0-th VectorSampledTask worker with args [{'scenes': ['FloorPlan_Train9_4', 'FloorPlan_Train8_3', 'FloorPlan_Train10_2', 'FloorPlan_Train5_4', 'FloorPlan_Train11_3', 'FloorPlan_Train9_3', 'FloorPlan_Train6_2', 'FloorPlan_Train4_3', 'FloorPlan_Train3_3', 'FloorPlan_Train4_5'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7fd4f24d4280>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7fd3895fadc0>], 'action_space': Discrete(4), 'seed': 1582316135, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.3'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd3895a2430>}]	[vector_sampled_tasks.py: 344]
10/09 16:56:25 INFO: Starting 0-th VectorSampledTask worker with args [{'scenes': ['FloorPlan_Train6_3', 'FloorPlan_Train12_3', 'FloorPlan_Train12_1', 'FloorPlan_Train5_3', 'FloorPlan_Train3_4', 'FloorPlan_Train1_4', 'FloorPlan_Train11_2', 'FloorPlan_Train5_2', 'FloorPlan_Train2_1', 'FloorPlan_Train6_5'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7fd4f24d5280>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7fd3ab2e4dc0>], 'action_space': Discrete(4), 'seed': 1160692746, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.5'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd3ab28c430>}]	[vector_sampled_tasks.py: 344]
10/09 16:56:25 INFO: Starting 0-th VectorSampledTask worker with args [{'scenes': ['FloorPlan_Train8_2', 'FloorPlan_Train9_5', 'FloorPlan_Train2_5', 'FloorPlan_Train7_2', 'FloorPlan_Train4_4', 'FloorPlan_Train11_4', 'FloorPlan_Train1_5', 'FloorPlan_Train10_4', 'FloorPlan_Train6_4', 'FloorPlan_Train2_4'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7fd4f24d4280>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7fd380544dc0>], 'action_space': Discrete(4), 'seed': 831769172, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.4'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True, 'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd3804ec430>}]	[vector_sampled_tasks.py: 344]
10/09 16:56:27 INFO: Actor critic model has no attribute inverse mode	[system.py: 128]
10/09 16:56:27 INFO: Actor critic model has no attribute rotation mode	[system.py: 128]
10/09 16:56:28 INFO: Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f46ef1bc790>, 'scenes': ['FloorPlan_Train6_1', 'FloorPlan_Train11_5', 'FloorPlan_Train3_5', 'FloorPlan_Train7_1', 'FloorPlan_Train8_1', 'FloorPlan_Train10_3', 'FloorPlan_Train4_1', 'FloorPlan_Train12_4', 'FloorPlan_Train10_5', 'FloorPlan_Train3_2'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7f474b5432e0>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7f46ef1bc5b0>], 'action_space': Discrete(4), 'seed': 1282648386, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.2'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True}	[vector_sampled_tasks.py: 975]
10/09 16:56:28 INFO: Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7fd5bdcd27f0>, 'scenes': ['FloorPlan_Train6_3', 'FloorPlan_Train12_3', 'FloorPlan_Train12_1', 'FloorPlan_Train5_3', 'FloorPlan_Train3_4', 'FloorPlan_Train1_4', 'FloorPlan_Train11_2', 'FloorPlan_Train5_2', 'FloorPlan_Train2_1', 'FloorPlan_Train6_5'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7fd60907d2e0>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7fd5bdcd2610>], 'action_space': Discrete(4), 'seed': 1160692746, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.5'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True}	[vector_sampled_tasks.py: 975]
10/09 16:56:28 INFO: Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f812fd80760>, 'scenes': ['FloorPlan_Train10_1', 'FloorPlan_Train2_3', 'FloorPlan_Train2_2', 'FloorPlan_Train5_1', 'FloorPlan_Train8_4', 'FloorPlan_Train7_3', 'FloorPlan_Train3_1', 'FloorPlan_Train12_5', 'FloorPlan_Train7_5', 'FloorPlan_Train7_4'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7f81b2c202e0>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7f812fd80580>], 'action_space': Discrete(4), 'seed': 43676229, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.1'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True}	[vector_sampled_tasks.py: 975]
10/09 16:56:28 INFO: Camera-Crack  False	[system.py: 128]
10/09 16:56:28 INFO: Motor Failure Mode (Left / Right) is  False	[system.py: 128]
10/09 16:56:28 INFO: Constant (but Uniform) translate friction False	[system.py: 128]
10/09 16:56:28 INFO: Constant (but Uniform) Rotate friction False	[system.py: 128]
10/09 16:56:28 INFO: Stoch translate friction False	[system.py: 128]
10/09 16:56:28 INFO: Stoch Rotate friction False	[system.py: 128]
10/09 16:56:28 INFO: Drift Mode is  False	[system.py: 128]
10/09 16:56:28 INFO: Camera-Crack  False	[system.py: 128]
10/09 16:56:28 INFO: Motor Failure Mode (Left / Right) is  False	[system.py: 128]
10/09 16:56:28 INFO: Constant (but Uniform) translate friction False	[system.py: 128]
10/09 16:56:28 INFO: Constant (but Uniform) Rotate friction False	[system.py: 128]
10/09 16:56:28 INFO: Stoch translate friction False	[system.py: 128]
10/09 16:56:28 INFO: Stoch Rotate friction False	[system.py: 128]
10/09 16:56:28 INFO: Drift Mode is  False	[system.py: 128]
10/09 16:56:28 INFO: Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f82c6b556d0>, 'scenes': ['FloorPlan_Train9_4', 'FloorPlan_Train8_3', 'FloorPlan_Train10_2', 'FloorPlan_Train5_4', 'FloorPlan_Train11_3', 'FloorPlan_Train9_3', 'FloorPlan_Train6_2', 'FloorPlan_Train4_3', 'FloorPlan_Train3_3', 'FloorPlan_Train4_5'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7f8349f262e0>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7f82c6b554f0>], 'action_space': Discrete(4), 'seed': 1582316135, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.3'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True}	[vector_sampled_tasks.py: 975]
10/09 16:56:28 INFO: Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f9173fbe700>, 'scenes': ['FloorPlan_Train8_5', 'FloorPlan_Train12_2', 'FloorPlan_Train9_1', 'FloorPlan_Train11_1', 'FloorPlan_Train4_2', 'FloorPlan_Train5_5', 'FloorPlan_Train1_2', 'FloorPlan_Train9_2', 'FloorPlan_Train1_3', 'FloorPlan_Train1_1'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7f92403352e0>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7f9173fbe520>], 'action_space': Discrete(4), 'seed': 1789368711, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.0'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True}	[vector_sampled_tasks.py: 975]
10/09 16:56:28 INFO: Starting 0-th SingleProcessVectorSampledTasks generator with args {'mp_ctx': <multiprocessing.context.ForkServerContext object at 0x7f1a690c56a0>, 'scenes': ['FloorPlan_Train8_2', 'FloorPlan_Train9_5', 'FloorPlan_Train2_5', 'FloorPlan_Train7_2', 'FloorPlan_Train4_4', 'FloorPlan_Train11_4', 'FloorPlan_Train1_5', 'FloorPlan_Train10_4', 'FloorPlan_Train6_4', 'FloorPlan_Train2_4'], 'max_steps': 500, 'sensors': [<allenact_plugins.ithor_plugin.ithor_sensors.RGBSensorThor object at 0x7f1a9542f2e0>, <allenact_plugins.robothor_plugin.robothor_sensors.GPSCompassSensorRoboThor object at 0x7f1a690c54c0>], 'action_space': Discrete(4), 'seed': 831769172, 'deterministic_cudnn': False, 'rewards_config': {'step_penalty': -0.01, 'goal_success_reward': 10.0, 'failed_stop_reward': 0.0, 'reached_max_steps_reward': 0.0, 'shaping_weight': 1.0}, 'env_args': {'width': 400, 'height': 300, 'continuousMode': True, 'applyActionNoise': True, 'agentType': 'stochastic', 'rotateStepDegrees': 30.0, 'gridSize': 0.25, 'snapToGrid': False, 'agentMode': 'locobot', 'fieldOfView': 63.453048374758716, 'include_private_scenes': False, 'renderDepthImage': False, 'x_display': '0.4'}, 'scene_directory': '/data1/home/robustnav/datasets/robothor-pointnav/train', 'loop_dataset': True, 'allow_flipping': True}	[vector_sampled_tasks.py: 975]
10/09 16:56:28 INFO: Camera-Crack  False	[system.py: 128]
10/09 16:56:28 INFO: Motor Failure Mode (Left / Right) is  False	[system.py: 128]
10/09 16:56:28 INFO: Constant (but Uniform) translate friction False	[system.py: 128]
10/09 16:56:28 INFO: Constant (but Uniform) Rotate friction False	[system.py: 128]
10/09 16:56:28 INFO: Stoch translate friction False	[system.py: 128]
10/09 16:56:28 INFO: Stoch Rotate friction False	[system.py: 128]
10/09 16:56:28 INFO: Drift Mode is  False	[system.py: 128]
10/09 16:56:29 INFO: Camera-Crack  False	[system.py: 128]
10/09 16:56:29 INFO: Motor Failure Mode (Left / Right) is  False	[system.py: 128]
10/09 16:56:29 INFO: Constant (but Uniform) translate friction False	[system.py: 128]
10/09 16:56:29 INFO: Constant (but Uniform) Rotate friction False	[system.py: 128]
10/09 16:56:29 INFO: Stoch translate friction False	[system.py: 128]
10/09 16:56:29 INFO: Stoch Rotate friction False	[system.py: 128]
10/09 16:56:29 INFO: Drift Mode is  False	[system.py: 128]
10/09 16:56:29 INFO: Camera-Crack  False	[system.py: 128]
10/09 16:56:29 INFO: Motor Failure Mode (Left / Right) is  False	[system.py: 128]
10/09 16:56:29 INFO: Constant (but Uniform) translate friction False	[system.py: 128]
10/09 16:56:29 INFO: Constant (but Uniform) Rotate friction False	[system.py: 128]
10/09 16:56:29 INFO: Stoch translate friction False	[system.py: 128]
10/09 16:56:29 INFO: Stoch Rotate friction False	[system.py: 128]
10/09 16:56:29 INFO: Drift Mode is  False	[system.py: 128]
10/09 16:56:29 INFO: Camera-Crack  False	[system.py: 128]
10/09 16:56:29 INFO: Motor Failure Mode (Left / Right) is  False	[system.py: 128]
10/09 16:56:29 INFO: Constant (but Uniform) translate friction False	[system.py: 128]
10/09 16:56:29 INFO: Constant (but Uniform) Rotate friction False	[system.py: 128]
10/09 16:56:29 INFO: Stoch translate friction False	[system.py: 128]
10/09 16:56:29 INFO: Stoch Rotate friction False	[system.py: 128]
10/09 16:56:29 INFO: Drift Mode is  False	[system.py: 128]
Found path: /data1/home/.ai2thor/releases/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e
Found path: /data1/home/.ai2thor/releases/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e
Found path: /data1/home/.ai2thor/releases/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e
Found path: /data1/home/.ai2thor/releases/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e
Found path: /data1/home/.ai2thor/releases/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e
Found path: /data1/home/.ai2thor/releases/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e/thor-Linux64-8f5a485b47864604795b3a57c062ada8c0bd1c3e

And the nvidia-smi is (6 A100 GPU):

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   1323988      C   python                          19738MiB |
|    0   N/A  N/A   2670163      G   /usr/lib/xorg/Xorg                  8MiB |
|    0   N/A  N/A   2670544      G   /usr/lib/xorg/Xorg                 17MiB |
|    0   N/A  N/A   3035634      C   Train-0                          2172MiB |
|    1   N/A  N/A   2670163      G   /usr/lib/xorg/Xorg                  8MiB |
|    1   N/A  N/A   2670544      G   /usr/lib/xorg/Xorg                 17MiB |
|    1   N/A  N/A   3035851      C   Train-1                          2172MiB |
|    2   N/A  N/A   2670163      G   /usr/lib/xorg/Xorg                  8MiB |
|    2   N/A  N/A   2670544      G   /usr/lib/xorg/Xorg                 17MiB |
|    2   N/A  N/A   3036119      C   Train-2                          2172MiB |
|    3   N/A  N/A   2670163      G   /usr/lib/xorg/Xorg                  8MiB |
|    3   N/A  N/A   2670544      G   /usr/lib/xorg/Xorg                 17MiB |
|    3   N/A  N/A   3036385      C   Train-3                          2172MiB |
|    4   N/A  N/A   2670163      G   /usr/lib/xorg/Xorg                  8MiB |
|    4   N/A  N/A   2670544      G   /usr/lib/xorg/Xorg                 17MiB |
|    4   N/A  N/A   3036652      C   Train-4                          2172MiB |
|    5   N/A  N/A   2670163      G   /usr/lib/xorg/Xorg                  8MiB |
|    5   N/A  N/A   2670544      G   /usr/lib/xorg/Xorg                 17MiB |
|    5   N/A  N/A   3036922      C   Train-5                          2172MiB |
|    5   N/A  N/A   3037196      C   Valid-0                          2172MiB |

danelpeng avatar Oct 09 '23 09:10 danelpeng