visual-navigation-agent-pytorch
visual-navigation-agent-pytorch copied to clipboard
program stop at load_state_dict()
I try to run the program and it stpped at the function _sync_network(). I found actually it stop at the load_state_dict() function in _sync_network(), but I can not solve it. There is no any error message.
What OS/python/torch version do you use? Can you dump stacktrace? Do you run it in debug mode (in this case I would need more precise description of your configuration)?
@jkulhanek Thank you for your reply. I try pytorch version1.1 and version1.3 in linux and python3.6.9. and there is no error or warning message. I uses the code from load_state_dict to test load_state_dict and it is OK.
And I don't know what is debug mode. I just use the command 'python3 train,py' to run the program
Ok, can you try to replace the code spawning multiple processes with single function call? You can do that by changing line 229 in the train.py file. This way we can check if you have a problem with multiprocessing.
@jkulhanek I tried thread.run() to launch the program and succeeded. so I tried thsi program in torch 0.4.1 and the policy_network succeed in loading state_dict, but the program stopped at (policy, value) = policy_network(...). I found all the problems are about policy_network, however there is no any error message :(
Ok, then it might be a problem with multiprocessing causing the deadlock. Can you please follow the instructions here: http://code.activestate.com/recipes/577334-how-to-debug-deadlocked-multi-threaded-programs/ to dump the stacktrace and upload the result here.
@jkulhanek it seems because of waiting pid
ThreadID: 140068893906688
File: "/usr/lib/python3.6/threading.py", line 884, in _bootstrap self._bootstrap_inner() File: "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File: "/home/usr/ws/visual-navigation-agent-pytorch/stacktracer.py", line 64, in run self.stacktraces() File: "/home/usr/ws/visual-navigation-agent-pytorch/stacktracer.py", line 78, in stacktraces fout.write(stacktraces()) File: "/home/usr/ws/visual-navigation-agent-pytorch/stacktracer.py", line 26, in stacktraces for filename, lineno, name, line in traceback.extract_stack(stack):
ThreadID: 140068973619008
File: "/usr/lib/python3.6/multiprocessing/util.py", line 319, in _exit_function p.join() File: "/usr/lib/python3.6/multiprocessing/process.py", line 124, in join res = self._popen.wait(timeout) File: "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 50, in wait return self.poll(os.WNOHANG if timeout == 0.0 else 0) File: "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll pid, sts = os.waitpid(self.pid, flag)
@jkulhanek When I release the code "mp.set_start_method("spawn"), the program ran, but it can not be stopped when I press ctr+c. And I think this operation may have negative impact on the calculation result.
@jkulhanek any idea? please
What is the problem? What do you mean by "have negative impact on the calculation result"? The reason you are not able to stop the program is that it uses multiprocessing, and I do not propagate the signal to other processes. It can be solved by catching signals and killing the processes, but it was not of a concern at the time of writing the code.