PaddleRobotics
PaddleRobotics copied to clipboard
运行Dynamic_train.py时报错Exception in thread Thread-2:parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__]:
作者您好,我运行QuadrupedalRobots/ETGRL/train.py训练是没有问题的,但是运行Dynamic_train.py时出现了下面3个问题,查看Dynamic_train.py同级目录里是有./model/Dynamic_parallel_model.py文件的,请问是什么原因造成的呢?
Exception in thread Thread-2:
parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__
]:
[Errno 2] No such file or directory: './model/Dynamic_parallel_model.py' FileNotFoundError: [Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'
parl.remote.exceptions.FutureFunctionError: There is an error raised when calling the future function __init__
.
完整报错信息
[04-26 20:29:14 MainThread @Dynamic_train.py:71] args:Namespace(K=20, alg='ga', eval=0, gamma=1, load='', outdir='Dynamic', sigma=0.1, steps=10000, suffix='exp0', thread=2, xparl='192.168.30.145:8037')
Exception in thread Thread-5:
Traceback (most recent call last):
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 92, in _run_object_in_backend
raise e
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 82, in _run_object_in_backend
self._xparl_remote_wrapper_obj = remote_wrapper(
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_wrapper.py", line 107, in init
raise RemoteError('init', traceback_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__
]:
[Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'
traceback:
Traceback (most recent call last):
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/job.py", line 297, in wait_for_connection
cls = load_remote_class(message[1])
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_class_serialization.py", line 207, in load_remote_class
with open(file_name + '.py') as t_file:
FileNotFoundError: [Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'
Exception in thread Thread-4:
Traceback (most recent call last):
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 92, in _run_object_in_backend
raise e
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 82, in _run_object_in_backend
self._xparl_remote_wrapper_obj = remote_wrapper(
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_wrapper.py", line 107, in init
raise RemoteError('init', traceback_str)
parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__
]:
[Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'
traceback:
Traceback (most recent call last):
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/job.py", line 297, in wait_for_connection
cls = load_remote_class(message[1])
File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_class_serialization.py", line 207, in load_remote_class
with open(file_name + '.py') as t_file:
FileNotFoundError: [Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'
Traceback (most recent call last):
File "/mnt/hgfs/虚拟机/PaddleRobotics-main/QuadrupedalRobots/ETGRL/Dynamic_train.py", line 74, in __init__
.
You can see the detailed error message above, which is printed by another thread.
Process finished with exit code 1
环境
Ubuntu 18.04 python 3.8 parl = 1.4.0 torch = 1.7.0 rlschool = 1.0.2
@xueeinstein
我在PaddlePaddle / PARL下找到了两个类似的问题: 第一个(2020.7.13) 主要的问题是当前运行的代码依赖于文件夹内的代码,xparl默认是只是分发当前文件夹中的.py文件。导致这个问题在于PARL/xparl的设计是针对多机并行设计的,需要把当前工作目录的代码分发到不同机器上(在单机上也用同样的逻辑),并提供了一个单机内的并行的解决思路:
export PYTHONPATH=./:$PYTHONPATH
xparl start --port XXXX
python main.py
但是这个方法并不有效。
第二个(2020.8.7)
XPARL会自动分发当前文件夹中的所有文件,但其子文件夹除外。XPARL不可能分发所有子文件夹,因为用户可能有复杂的目录结构,有时甚至有大文件。要显式分发所需的文件,请尝试以下API:
请注意,您必须传递文件,而不是直接传递文件夹。
法1:
修改
parl.connect(xparl_addr)
为
parl.connect(xparl_addr, distributed_files=['./model/Dynamic_parallel_model.py','./alg/es.py','./model/__init__.py'])
这个方法也无效。
法2:
git clone -b xparl_submod https://github.com/PaddlePaddle/PARL/
cd PARL
pip install .
这个方法执行不了 报错:
正克隆到 'PARL'...
fatal: 远程分支 xparl_submod 在上游 origin 未发现
您好,请问你们当时是怎么跑通的,咋到我这就会出现问题了呢?能不能给点解决的建议,谢谢。@xueeinstein @TomorrowIsAnOtherDay
运行Dynamic_train.py时出现了下面3个问题
我看了下Dynamic_train.py
是没有并行代码的,好奇你这个并行的错误怎么来的,是你自己改造并行版本的吗?
没有改造,Dynamic_train.py的这行代码:
model = ES_ParallelModel(mean_dict=MEAN_DICT,gait=GAIT_LIST,K=args.K,thread = args.thread,sigma=args.sigma,
dynamic_param=dynamic_param,outdir=outdir,alg=args.alg,xparl_addr = args.xparl)
调用了./model/Dynamic_parallel_model.py的ES_ParallelModel类,Dynamic_parallel_model.py里面有并行代码,就下面这行:
@parl.remote_class(wait=False)
class RemoteESAgent(object):
@TomorrowIsAnOtherDay
抱歉,今天会议比较多,我们尽快看下这个问题。
不着急,您先忙。 @TomorrowIsAnOtherDay