lm-human-preferences icon indicating copy to clipboard operation
lm-human-preferences copied to clipboard

Got an error that I can't trace

Open mysterefrank opened this issue 6 years ago • 1 comments
trafficstars

Hi all, I'm getting an error that is difficult to trace - any advice?

Traceback (most recent call last): File "./sample.py", line 73, in sample=launch_sample, File "/Users/mysterefrank/deep_collective_fun/lm-human-preferences/lm_human_preferences/utils/launch.py", line 65, in main fire.Fire(_Commands) File "/Users/mysterefrank/.local/share/virtualenvs/lm-human-preferences-9-VNjZ2b/lib/python3.7/site-packages/fire/core.py", line 127, in Fire component_trace = _Fire(component, args, context, name) File "/Users/mysterefrank/.local/share/virtualenvs/lm-human-preferences-9-VNjZ2b/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire component, remaining_args) File "/Users/mysterefrank/.local/share/virtualenvs/lm-human-preferences-9-VNjZ2b/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable result = fn(*varargs, **kwargs) File "./sample.py", line 69, in launch_sample launch.launch('sample', partial(sample_policy, **kwargs), mode=mode, mpi=mpi) File "/Users/mysterefrank/deep_collective_fun/lm-human-preferences/lm_human_preferences/utils/launch.py", line 13, in launch subprocess.check_call(['mpiexec', '-n', str(mpi), 'python', '-c', 'import sys; import pickle; pickle.loads(open("/tmp/pickle_fn", "rb").read())()'], stderr=subprocess.STDOUT) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 347, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['mpiexec', '-n', '1', 'python', '-c', 'import sys; import pickle; pickle.loads(open("/tmp/pickle_fn", "rb").read())()']' returned non-zero exit status 1.

mysterefrank avatar Nov 09 '19 00:11 mysterefrank

FWIW, I got similar errors. There were two problems: the mpiexec command was calling Python 2 instead of Python 3, and I had to hardwire it to call python3. The second problem was that I couldn't get it to run on a single GPU; fortunately, I have 2 GPUs locally and could run on those. I didn't test it, but my theory is that mpiexec can't run on 1 GPU and if you want to run on 1 GPU (OA used 8 GPUs for all the experiments, IIRC), you may need to remove the mpiexec calls entirely.

gwern avatar Dec 22 '19 23:12 gwern