RelationalGraphLearning
RelationalGraphLearning copied to clipboard
train.py --policy cadrl problem
Output directory already exists! Overwrite the folder? (y/n)y 2023-08-09 19:38:00, INFO: Current git head hash code: 8e87aa5ed8221efd688f8e6857ba4c38637bf6e1 2023-08-09 19:38:00, INFO: Current config content is :<module 'config' from 'data/output/config.py'> 2023-08-09 19:38:00, INFO: Using device: cpu 2023-08-09 19:38:00, INFO: Similarity_func: embedded_gaussian 2023-08-09 19:38:00, INFO: Layerwise_graph: False 2023-08-09 19:38:00, INFO: Skip_connection: True 2023-08-09 19:38:00, INFO: Number of layers: 2 2023-08-09 19:38:00, INFO: Similarity_func: embedded_gaussian 2023-08-09 19:38:00, INFO: Layerwise_graph: False 2023-08-09 19:38:00, INFO: Skip_connection: True 2023-08-09 19:38:00, INFO: Number of layers: 2 2023-08-09 19:38:00, INFO: Planning depth: 1 2023-08-09 19:38:00, INFO: Planning width: 1 2023-08-09 19:38:00, INFO: Sparse search: None 2023-08-09 19:38:00, INFO: human number: 5 2023-08-09 19:38:00, INFO: Not randomize human's radius and preferred speed 2023-08-09 19:38:00, INFO: Training simulation: circle_crossing, test simulation: circle_crossing 2023-08-09 19:38:00, INFO: Square width: 20, circle width: 4 2023-08-09 19:38:00, INFO: Lr: 0.001 for parameters graph_model.w_a graph_model.w_r.0.weight graph_model.w_r.0.bias graph_model.w_r.2.weight graph_model.w_r.2.bias graph_model.w_h.0.weight graph_model.w_h.0.bias graph_model.w_h.2.weight graph_model.w_h.2.bias graph_model.Ws.0 graph_model.Ws.1 value_network.0.weight value_network.0.bias value_network.2.weight value_network.2.bias value_network.4.weight value_network.4.bias value_network.6.weight value_network.6.bias graph_model.w_a graph_model.w_r.0.weight graph_model.w_r.0.bias graph_model.w_r.2.weight graph_model.w_r.2.bias graph_model.w_h.0.weight graph_model.w_h.0.bias graph_model.w_h.2.weight graph_model.w_h.2.bias graph_model.Ws.0 graph_model.Ws.1 human_motion_predictor.0.weight human_motion_predictor.0.bias human_motion_predictor.2.weight human_motion_predictor.2.bias with Adam optimizer 0%| | 0/2000 [00:00<?, ?it/s]Traceback (most recent call last): File "train.py", line 268, in <module> main(sys_args) File "train.py", line 168, in main explorer.run_k_episodes(il_episodes, 'train', update_memory=True, imitation_learning=True) File "/RelationalGraphLearning/crowd_nav/utils/explorer.py", line 43, in run_k_episodes ob = self.env.reset(phase) TypeError: reset() takes 1 positional argument but 2 were given 0%| | 0/2000 [00:00<?, ?it/s]
I am sorry to bother you, but I couldnt understand and solve this problem...
@MandyZhang4869 please check this link: https://github.com/vita-epfl/CrowdNav/issues/45
Looks like you need downgrade the gym version.
emm, What about this one? This is the error encountered while running RGL. 【I am so sorry to bother you again】
2023-08-16 19:13:34, INFO: Lr: 0.001 for parameters graph_model.w_a graph_model.w_r.0.weight graph_model.w_r.0.bias graph_model.w_r.2.weight graph_model.w_r.2.bias graph_model.w_h.0.weight graph_model.w_h.0.bias graph_model.w_h.2.weight graph_model.w_h.2.bias graph_model.Ws.0 graph_model.Ws.1 value_network.0.weight value_network.0.bias value_network.2.weight value_network.2.bias value_network.4.weight value_network.4.bias value_network.6.weight value_network.6.bias graph_model.w_a graph_model.w_r.0.weight graph_model.w_r.0.bias graph_model.w_r.2.weight graph_model.w_r.2.bias graph_model.w_h.0.weight graph_model.w_h.0.bias graph_model.w_h.2.weight graph_model.w_h.2.bias graph_model.Ws.0 graph_model.Ws.1 human_motion_predictor.0.weight human_motion_predictor.0.bias human_motion_predictor.2.weight human_motion_predictor.2.bias with Adam optimizer 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████▉| 1998/2000 [00:38<00:00, 45.84it/s]2023-08-16 19:14:13, INFO: TRAIN has success rate: 0.89, collision rate: 0.09, nav time: 12.23, total reward: 0.2389, average return: 0.4869 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2000/2000 [00:38<00:00, 52.21it/s] Traceback (most recent call last): File "train.py", line 268, in <module> main(sys_args) File "train.py", line 169, in main trainer.optimize_epoch(il_epochs) File "/RelationalGraphLearning/crowd_nav/utils/trainer.py", line 83, in optimize_epoch loss.backward() File "/root/anaconda3/envs/new_drl37/lib/python3.7/site-packages/torch/_tensor.py", line 489, in backward self, gradient, retain_graph, create_graph, inputs=inputs File "/root/anaconda3/envs/new_drl37/lib/python3.7/site-packages/torch/autograd/__init__.py", line 199, in backward allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 6, 32]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
ok i downgrade the torch version
Using Python 3.7.17, I had to downgrade Gym to 0.22.0, Torch to 1.9.0 and Torchvision to 0.10.0. Now everything seems to work fine.
I also modified the following:
- line 43 in crowd_nav/utils/explorer.py from "ob = self.env.reset(phase)" to "ob = self.env.reset(phase=phase)".
- line 110 in crowd_nav/test.py from "ob = env.reset(args.phase, args.test_case)" to "ob = env.reset(phase=args.phase, test_case=args.test_case)".
- line 125 in crowd_nav/test.py from "env.render('traj', args.video_file)" to "env.render(mode='traj', output_file=args.video_file)".
- line 133 in crowd_nav/test.py from "env.render('video', args.video_file)" to "env.render(mode='video', output_file=args.video_file)".
Additionally, if you dont want pytorch version changed, just modify https://github.com/ChanganVR/RelationalGraphLearning/blob/8e87aa5ed8221efd688f8e6857ba4c38637bf6e1/crowd_nav/policy/graph_model.py#L127 to "next_H = next_H + H". And hope @ChanganVR to correct it.