pytorch-a3c icon indicating copy to clipboard operation
pytorch-a3c copied to clipboard

When using no-shared = False, the process is blocked

Open keithyin opened this issue 8 years ago • 10 comments

Hi,Today, i run the code, and found that when no-shared=False, the process will be blocked. Do you have any suggesstions to fix that?

THANKS!

keithyin avatar Oct 21 '17 08:10 keithyin

Blocking doesn't happen to me. What configuration are you using?

ikostrikov avatar Oct 21 '17 19:10 ikostrikov

Ubuntu16.04 pytorch 0.2 I just run the downloaded source code, and modifying nothing. Blocking will happed. But if i use no-shared=True, the code can be run.
It is weird.

keithyin avatar Oct 22 '17 07:10 keithyin

Same here. Using Ubuntu 16.04, pytorch 0.2, and python3.5. Works fine on OSX though

wnstlr avatar Dec 04 '17 20:12 wnstlr

Anyone found a solution?

ShaniGam avatar Dec 07 '17 15:12 ShaniGam

Please report more information.

I tested it on ubuntu 16.04. PyTorch 0.2 and 0.3, python 3.6 and it works for me both on ubuntu and os x.

ikostrikov avatar Dec 07 '17 20:12 ikostrikov

Ubuntu 16.04, PyTorch 0.2, python 3.5 When I exit with ctrl-C I get that the process is stuck right before p.join().

^CTraceback (most recent call last): File "main.py", line 77, in p.join() File "/usr/lib/python3.5/multiprocessing/process.py", line 121, in join res = self._popen.wait(timeout) File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 51, in wait return self.poll(os.WNOHANG if timeout == 0.0 else 0) File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll pid, sts = os.waitpid(self.pid, flag) KeyboardInterrupt ^CError in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll pid, sts = os.waitpid(self.pid, flag) KeyboardInterrupt

wnstlr avatar Dec 07 '17 21:12 wnstlr

It's the exact same problem as in: https://github.com/pytorch/pytorch/issues/2496 It's stuck on the ConvND call: f = ConvNd(_pair(stride), _pair(padding), _pair(dilation), False, _pair(0), groups, torch.backends.cudnn.benchmark, torch.backends.cudnn.enabled) return f(input, weight, bias)

ShaniGam avatar Dec 08 '17 08:12 ShaniGam

I got same problem with Pytorch 0.3. I could use this code in MacOS, but can't use in Ubuntu 16.04.

japan4415 avatar Jan 02 '18 21:01 japan4415

I find way!!! mp.set_start_method("spawn") and change F.softmax(logit) to F.softmax(logit,dim=1)

japan4415 avatar Jan 10 '18 00:01 japan4415

@japan4415

Thanks to share your solution, mp.set_start_method("spawn") should be added to the if __name__ == '__main__' scope according to this issue on pytorch. After that every thing works fine.