PaddleGAN icon indicating copy to clipboard operation
PaddleGAN copied to clipboard

Aistudio多卡任务报错

Open NexusXi opened this issue 4 years ago • 1 comments

项目是生成对抗网络七日打卡营:自选模型实现超分这一任务,我将其改成多卡脚本任务, 代码是参考了原项目的notebook,其中实现多卡的代码是

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch main.py --config-file ../../esrgan_psnr_x4_div2k.yaml

这句话是照抄:https://github.com/PaddlePaddle/PaddleGAN/blob/develop/docs/zh_CN/get_started.md#多卡训练

出现warning: INFO 2021-04-19 19:19:50,552 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0 INFO 2021-04-19 19:20:05,649 launch_utils.py:307] terminate all the procs ERROR 2021-04-19 19:20:05,649 launch_utils.py:545] ABORT!!! Out of all 4 trainers, the trainer process with rank=[0] was aborted. Please check its log. INFO 2021-04-19 19:20:08,652 launch_utils.py:307] terminate all the procs 还有报错: Traceback (most recent call last): File "main.py", line 56, in main(args, cfg) File "main.py", line 32, in main trainer = Trainer(cfg) File "/root/paddlejob/workspace/code/PaddleGAN/ppgan/engine/trainer.py", line 80, in init self.distributed_data_parallel() File "/root/paddlejob/workspace/code/PaddleGAN/ppgan/engine/trainer.py", line 144, in distributed_data_parallel strategy = paddle.distributed.prepare_context() AttributeError: module 'paddle.distributed' has no attribute 'prepare_context' /mnt [INFO]: train job failed! train_ret: 1 可以帮我解决或者再测试一下吗

NexusXi avatar Apr 19 '21 14:04 NexusXi

您的代码不是最新的,更新一下代码就好了。这个是比较早之前的已知问题了。

LielinJiang avatar Apr 20 '21 03:04 LielinJiang

问题过于久远,如果有图像和视频生成的需求,可以使用新的跨模态工具: https://github.com/PaddlePaddle/PaddleMIX/tree/develop

JunnYu avatar Feb 29 '24 03:02 JunnYu