PaddleRobotics icon indicating copy to clipboard operation
PaddleRobotics copied to clipboard

运行Dynamic_train.py时报错Exception in thread Thread-2:parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__]:

Open Senwei-Huang opened this issue 2 years ago • 5 comments

作者您好,我运行QuadrupedalRobots/ETGRL/train.py训练是没有问题的,但是运行Dynamic_train.py时出现了下面3个问题,查看Dynamic_train.py同级目录里是有./model/Dynamic_parallel_model.py文件的,请问是什么原因造成的呢?

Exception in thread Thread-2: parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__]:

[Errno 2] No such file or directory: './model/Dynamic_parallel_model.py' FileNotFoundError: [Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'

parl.remote.exceptions.FutureFunctionError: There is an error raised when calling the future function __init__.

完整报错信息

[04-26 20:29:14 MainThread @Dynamic_train.py:71] args:Namespace(K=20, alg='ga', eval=0, gamma=1, load='', outdir='Dynamic', sigma=0.1, steps=10000, suffix='exp0', thread=2, xparl='192.168.30.145:8037') Exception in thread Thread-5: Traceback (most recent call last): File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 92, in _run_object_in_backend raise e File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 82, in _run_object_in_backend self._xparl_remote_wrapper_obj = remote_wrapper( File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_wrapper.py", line 107, in init raise RemoteError('init', traceback_str) parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__]: [Errno 2] No such file or directory: './model/Dynamic_parallel_model.py' traceback: Traceback (most recent call last): File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/job.py", line 297, in wait_for_connection cls = load_remote_class(message[1]) File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_class_serialization.py", line 207, in load_remote_class with open(file_name + '.py') as t_file: FileNotFoundError: [Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'

Exception in thread Thread-4: Traceback (most recent call last): File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 92, in _run_object_in_backend raise e File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 82, in _run_object_in_backend self._xparl_remote_wrapper_obj = remote_wrapper( File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_wrapper.py", line 107, in init raise RemoteError('init', traceback_str) parl.remote.exceptions.RemoteError: [PARL remote error when calling function __init__]: [Errno 2] No such file or directory: './model/Dynamic_parallel_model.py' traceback: Traceback (most recent call last): File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/job.py", line 297, in wait_for_connection cls = load_remote_class(message[1]) File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/remote_class_serialization.py", line 207, in load_remote_class with open(file_name + '.py') as t_file: FileNotFoundError: [Errno 2] No such file or directory: './model/Dynamic_parallel_model.py'

Traceback (most recent call last): File "/mnt/hgfs/虚拟机/PaddleRobotics-main/QuadrupedalRobots/ETGRL/Dynamic_train.py", line 74, in main() File "/mnt/hgfs/虚拟机/PaddleRobotics-main/QuadrupedalRobots/ETGRL/Dynamic_train.py", line 72, in main model.train(args.steps) File "/mnt/hgfs/虚拟机/PaddleRobotics-main/QuadrupedalRobots/ETGRL/model/Dynamic_parallel_model.py", line 159, in train mean_re = self.update(epoch) File "/mnt/hgfs/虚拟机/PaddleRobotics-main/QuadrupedalRobots/ETGRL/model/Dynamic_parallel_model.py", line 128, in update future_objects.append(self.agent_list[i].batch_sample_episodes(param=solutions[i*self.K:(i+1)*self.K,:],K = self.K)) File "/home/senweihuang/anaconda3/envs/parl/lib/python3.8/site-packages/parl/remote/future_mode/proxy_wrapper_nowait.py", line 144, in getattr raise self._xparl_remote_object_exception parl.remote.exceptions.FutureFunctionError: There is an error raised when calling the future function __init__. You can see the detailed error message above, which is printed by another thread.

Process finished with exit code 1

环境

Ubuntu 18.04 python 3.8 parl = 1.4.0 torch = 1.7.0 rlschool = 1.0.2

@xueeinstein

Senwei-Huang avatar Apr 27 '22 01:04 Senwei-Huang

我在PaddlePaddle / PARL下找到了两个类似的问题: 第一个(2020.7.13) 主要的问题是当前运行的代码依赖于文件夹内的代码,xparl默认是只是分发当前文件夹中的.py文件。导致这个问题在于PARL/xparl的设计是针对多机并行设计的,需要把当前工作目录的代码分发到不同机器上(在单机上也用同样的逻辑),并提供了一个单机内的并行的解决思路:

export PYTHONPATH=./:$PYTHONPATH
xparl start --port XXXX 
python main.py

但是这个方法并不有效。

第二个(2020.8.7) XPARL会自动分发当前文件夹中的所有文件,但其子文件夹除外。XPARL不可能分发所有子文件夹,因为用户可能有复杂的目录结构,有时甚至有大文件。要显式分发所需的文件,请尝试以下API: 请注意,您必须传递文件,而不是直接传递文件夹。 法1: 修改 parl.connect(xparl_addr)parl.connect(xparl_addr, distributed_files=['./model/Dynamic_parallel_model.py','./alg/es.py','./model/__init__.py'])

这个方法也无效。

法2:

git clone -b xparl_submod https://github.com/PaddlePaddle/PARL/
cd PARL
pip install .

这个方法执行不了 报错:

正克隆到 'PARL'...
fatal: 远程分支 xparl_submod 在上游 origin 未发现

您好,请问你们当时是怎么跑通的,咋到我这就会出现问题了呢?能不能给点解决的建议,谢谢。@xueeinstein @TomorrowIsAnOtherDay

Senwei-Huang avatar Apr 27 '22 13:04 Senwei-Huang

运行Dynamic_train.py时出现了下面3个问题 我看了下Dynamic_train.py是没有并行代码的,好奇你这个并行的错误怎么来的,是你自己改造并行版本的吗?

TomorrowIsAnOtherDay avatar Apr 27 '22 13:04 TomorrowIsAnOtherDay

没有改造,Dynamic_train.py的这行代码:

model = ES_ParallelModel(mean_dict=MEAN_DICT,gait=GAIT_LIST,K=args.K,thread = args.thread,sigma=args.sigma,
                             dynamic_param=dynamic_param,outdir=outdir,alg=args.alg,xparl_addr = args.xparl)

调用了./model/Dynamic_parallel_model.py的ES_ParallelModel类,Dynamic_parallel_model.py里面有并行代码,就下面这行:

@parl.remote_class(wait=False)
class RemoteESAgent(object):

@TomorrowIsAnOtherDay

Senwei-Huang avatar Apr 27 '22 14:04 Senwei-Huang

抱歉,今天会议比较多,我们尽快看下这个问题。

TomorrowIsAnOtherDay avatar Apr 28 '22 02:04 TomorrowIsAnOtherDay

不着急,您先忙。 @TomorrowIsAnOtherDay

Senwei-Huang avatar Apr 28 '22 03:04 Senwei-Huang