PARL icon indicating copy to clipboard operation
PARL copied to clipboard

WRN No vacant cpu resources at the moment, will try 300 times later

Open rivellijp opened this issue 5 years ago • 8 comments

I can't run any code with parl, always getting that error. This is how I start on my local machine, Windows 10:

xparl start --port 8010

# The Parl cluster is started at localhost:8010.
# A local worker with 8 CPUs is connected to the cluster.
# Starting the cluster monitor...
## If you want to check cluster status, please view:
http://192.168.1.99:61581
or call:
xparl status
## If you want to add more CPU resources, please call:
xparl connect --address 192.168.1.99:8010
## If you want to shutdown the cluster, please call:
xparl stop

And this is whatI get with status command:

xparl status

# Cluster localhost:8010 has 0 used cpus, 0 vacant cpus.
# If you want to check cluster status, please view: http://192.168.1.99:61721

rivellijp avatar Nov 04 '20 04:11 rivellijp

Hi, thanks for your feedback. Can you provide more environment information?

  • Python version
  • parl version
  • running terminal

zenghsh3 avatar Nov 04 '20 04:11 zenghsh3

Python 3.7.9 parl==1.3.2 running at command line

rivellijp avatar Nov 04 '20 05:11 rivellijp

Hi, I cannot reproduce the error in the same running environment (win10, python3.7.9 and parl==1.3.2). The error looks like the worker cannot start normally, can you try to run the command: xparl connect --address 192.168.1.99:8010 after running the command xparl start --port 8010.

And tell us the error information.

zenghsh3 avatar Nov 04 '20 07:11 zenghsh3

I thought it was something about win10, as you couldn't reproduce the error I just cleaned up everything and reinstalled python and parl to same versions, now it's working. Thanks!

# Cluster localhost:8010 has 0 used cpus, 8 vacant cpus.

rivellijp avatar Nov 04 '20 13:11 rivellijp

Glad to hear that. Feel free to reopen the issue if you have other problems:)

TomorrowIsAnOtherDay avatar Nov 04 '20 15:11 TomorrowIsAnOtherDay

I have the issue again, but now I have narrowed down a little more: Clean install of python + parl only, I can start, get status and stop many times, no issue # Cluster localhost:8010 has 0 used cpus, 8 vacant cpus.

But then, after installing pytorch (tried 1.6.0 and 1.7.0): # Cluster localhost:8010 has 0 used cpus, 0 vacant cpus.

Uninstalling pytorch, parl works again # Cluster localhost:8010 has 0 used cpus, 8 vacant cpus.

Somehow pytorch is messing up parl, any ideas?

rivellijp avatar Nov 04 '20 17:11 rivellijp

Hi, I cannot reproduce the error again. (I installed torch==1.7.0) Maybe you can try to run the command: xparl connect --address 192.168.1.99:8010, and see what will happen.

zenghsh3 avatar Nov 05 '20 02:11 zenghsh3

Hi, I met the same question when running the alphago project in benchmark . Python 3.7.9 parl==1.3.2 torch==1.7.0(tried both cpu and gpu version) running at command line in ubuntu 18.04

xparl status

[09-10 15:53:24 MainThread @logger.py:224] Argv: /home/hxu/anaconda3/envs/parl/bin/xparl connect --address 192.168.70.105:8010 /home/hxu/anaconda3/envs/parl/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject return f(*args, **kwds) /home/hxu/anaconda3/envs/parl/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject return f(*args, **kwds)

(parl) hxu@hxu:~/netease/PARL/benchmark/torch/AlphaZero$ xparl connect --address 192.168.70.105:8010

[09-10 15:53:24 MainThread @logger.py:224] Argv: /home/hxu/anaconda3/envs/parl/bin/xparl connect --address 192.168.70.105:8010 /home/hxu/anaconda3/envs/parl/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject return f(*args, **kwds) /home/hxu/anaconda3/envs/parl/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject return f(*args, **kwds)

python main.py # in AlphaGo Dirs [09-10 15:53:34 MainThread @remote_decorator.py:178] WRN No vacant cpu resources at the moment, will try 300 times later.

R-Ceph avatar Sep 10 '21 07:09 R-Ceph