GAN2Shape
GAN2Shape copied to clipboard
RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1.
Now I'm getting the following error. I installed all the dependencies including CUDA, but when I run:
sh scripts/run_car.sh
I'm getting this:
Load config from yml file: configs/car.ymlLoad config from yml file: configs/car.ymlLoad config from yml file: configs/car.ymlLoad config from yml file: configs/car.yml
Loading configs from configs/car.ymlLoading configs from configs/car.yml
Loading configs from configs/car.yml
Loading configs from configs/car.yml
{'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True}{'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True}
{'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True}
{'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True}
Setting up Perceptual loss...
Setting up Perceptual loss...
Setting up Perceptual loss...
Setting up Perceptual loss...
Loading model from: /home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth
Traceback (most recent call last):
File "run.py", line 31, in <module>
trainer = Trainer(cfgs, GAN2Shape)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/trainer.py", line 23, in __init__
self.model = model(cfgs)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/model.py", line 89, in __init__
model='net-lin', net='vgg', use_gpu=True, gpu_ids=[torch.device(self.rank)]
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/__init__.py", line 22, in __init__
self.model.initialize(model=model, net=net, use_gpu=use_gpu, colorspace=colorspace, spatial=self.spatial, gpu_ids=gpu_ids)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/dist_model.py", line 75, in initialize
self.net.load_state_dict(torch.load(model_path, **kw), strict=False)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 702, in _legacy_load
Loading model from: /home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth
result = unpickler.load()
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 665, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 740, in restore_location
Traceback (most recent call last):
File "run.py", line 31, in <module>
trainer = Trainer(cfgs, GAN2Shape)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/trainer.py", line 23, in __init__
self.model = model(cfgs)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/model.py", line 89, in __init__
return default_restore_location(storage, str(map_location))
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 156, in default_restore_location
model='net-lin', net='vgg', use_gpu=True, gpu_ids=[torch.device(self.rank)]
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/__init__.py", line 22, in __init__
result = fn(storage, location)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 132, in _cuda_deserialize
self.model.initialize(model=model, net=net, use_gpu=use_gpu, colorspace=colorspace, spatial=self.spatial, gpu_ids=gpu_ids)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/dist_model.py", line 75, in initialize
device = validate_cuda_device(location)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 126, in validate_cuda_device
self.net.load_state_dict(torch.load(model_path, **kw), strict=False)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
device, torch.cuda.device_count()))
RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 702, in _legacy_load
result = unpickler.load()
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 665, in persistent_load
Loading model from: /home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 740, in restore_location
return default_restore_location(storage, str(map_location))
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 156, in default_restore_location
Loading model from: /home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth
result = fn(storage, location)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 132, in _cuda_deserialize
device = validate_cuda_device(location)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 126, in validate_cuda_device
device, torch.cuda.device_count()))
RuntimeError: Attempting to deserialize object on CUDA device 3 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.
Traceback (most recent call last):
File "run.py", line 31, in <module>
trainer = Trainer(cfgs, GAN2Shape)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/trainer.py", line 23, in __init__
self.model = model(cfgs)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/model.py", line 89, in __init__
model='net-lin', net='vgg', use_gpu=True, gpu_ids=[torch.device(self.rank)]
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/__init__.py", line 22, in __init__
self.model.initialize(model=model, net=net, use_gpu=use_gpu, colorspace=colorspace, spatial=self.spatial, gpu_ids=gpu_ids)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/dist_model.py", line 75, in initialize
self.net.load_state_dict(torch.load(model_path, **kw), strict=False)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 702, in _legacy_load
result = unpickler.load()
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 665, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 740, in restore_location
return default_restore_location(storage, str(map_location))
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 156, in default_restore_location
result = fn(storage, location)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 132, in _cuda_deserialize
device = validate_cuda_device(location)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/serialization.py", line 126, in validate_cuda_device
device, torch.cuda.device_count()))
RuntimeError: Attempting to deserialize object on CUDA device 2 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.
...[net-lin [vgg]] initialized
...Done
Please I need help, thanks
@leonel-os It seems that your machine has only 1 GPU, while our scripts require at least 4 GPUs. You need to revise the run_car.sh
script accordingly to run on one GPU. Specifically, you need to change CUDA_VISIBLE_DEVICES=0,1,2,3
to CUDA_VISIBLE_DEVICES=0
, and close distributed training. However, it is possible that you may get sub-optimal quality on only one GPU. I suggest running on more GPUs if possible.
@XingangPan thanks, changing the run_car.sh configuration fixed the error.
EXP=car
CONFIG=car
GPUS=1
PORT=${PORT:-29577}
mkdir -p results/${EXP}
CUDA_VISIBLE_DEVICES=0 \
python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \
run.py \
--launcher pytorch \
--config configs/${CONFIG}.yml \
2>&1 | tee results/${EXP}/log.txt
but now I'm getting the following error, related to CUDA out of memory:
sh scripts/run_car.sh
Load config from yml file: configs/car.yml
Loading configs from configs/car.yml
{'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': False, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 2, 'distributed': True}
Setting up Perceptual loss...
Loading model from: /home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth
...[net-lin [vgg]] initialized
...Done
Loading images...
Traceback (most recent call last):
File "run.py", line 34, in <module>
trainer.train()
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/trainer.py", line 158, in train
self.setup_data(epoch)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/trainer.py", line 78, in setup_data
self.latent_list[epoch])
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/model.py", line 149, in setup_target
self.load_latent()
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/model.py", line 248, in load_latent
self.latent_w, self.gan_im = get_w_img(self.w_path)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/model.py", line 227, in get_w_img
truncation=self.truncation, randomize_noise=False)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/model.py", line 595, in forward
out = conv2(out, latent[:, i + 1], noise=noise2)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/model.py", line 350, in forward
out = self.conv(input, style)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/darkayserleo/Documentos/Tesis/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/model.py", line 287, in forward
out = F.conv2d(input, weight, padding=self.padding, groups=batch)
RuntimeError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 1.95 GiB total capacity; 901.72 MiB already allocated; 99.88 MiB free; 928.00 MiB reserved in total by PyTorch)
Traceback (most recent call last):
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
main()
File "/home/darkayserleo/anaconda3/envs/unsup3d/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/darkayserleo/anaconda3/envs/unsup3d/bin/python', '-u', 'run.py', '--local_rank=0', '--launcher', 'pytorch', '--config', 'configs/car.yml']' returned non-zero exit status 1.
I don't know how to fix it. I read some forums and they say I need to change the batch size and/or num of workers, is that right? What else I need to change to run this demo? Please
Thanks in advance
Leonel
Hello, I have encountered the same problem, changing the batch size/num of workers does not work. Have you solved it yet? Looking forward to your reply. Thanks!
(base) [yshan@saturn12 GAN2Shape]$ sh scripts/run_car.sh
/data/yshan/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank
argument to be set, please
change it to read from os.environ['LOCAL_RANK']
instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
/data/yshan/anaconda3/lib/python3.9/site-packages/mmcv/init.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
warnings.warn(
StyleGAN2: Optimized CUDA op FusedLeakyReLU not available, using native PyTorch fallback.
StyleGAN2: Optimized CUDA op UpFirDn2d not available, using native PyTorch fallback.
Load config from yml file: configs/car.yml
Loading configs from configs/car.yml
{'checkpoint_dir': 'results/car', 'save_checkpoint_freq': 500, 'keep_num_checkpoint': 2, 'use_logger': True, 'log_freq': 100, 'joint_train': False, 'independent': False, 'reset_weight': True, 'save_results': True, 'num_stage': 4, 'flip1_cfg': [False, False, False, False], 'flip3_cfg': [False, False, False, False], 'stage_len_dict': {'step1': 700, 'step2': 700, 'step3': 600}, 'stage_len_dict2': {'step1': 200, 'step2': 500, 'step3': 400}, 'image_size': 128, 'load_gt_depth': False, 'img_list_path': 'data/car/list.txt', 'img_root': 'data/car', 'latent_root': 'data/car/latents', 'model_name': 'gan2shape_car', 'category': 'car', 'share_weight': True, 'relative_enc': False, 'use_mask': True, 'add_mean_L': True, 'add_mean_V': True, 'min_depth': 0.9, 'max_depth': 1.1, 'xyz_rotation_range': 60, 'xy_translation_range': 0.1, 'z_translation_range': 0, 'collect_iters': 100, 'batchsize': 8, 'lr': 0.0001, 'lam_perc': 0.5, 'lam_smooth': 0.01, 'lam_regular': 0.01, 'view_mvn_path': 'checkpoints/view_light/view_mvn.pth', 'light_mvn_path': 'checkpoints/view_light/light_mvn.pth', 'rand_light': [-1, 1, -0.2, 0.8, -0.1, 0.6, -0.6], 'channel_multiplier': 2, 'gan_size': 512, 'gan_ckpt': 'checkpoints/stylegan2/stylegan2-car-config-f.pt', 'F1_d': 2, 'rot_center_depth': 1.0, 'fov': 10, 'tex_cube_size': 2, 'config': 'configs/car.yml', 'seed': 0, 'num_workers': 4, 'distributed': True}
Setting up Perceptual loss...
/data/yshan/anaconda3/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/data/yshan/anaconda3/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None
for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=VGG16_Weights.IMAGENET1K_V1
. You can also use weights=VGG16_Weights.DEFAULT
to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: /data/yshan/GAN2Shape/gan2shape/stylegan2/stylegan2-pytorch/lpips/weights/v0.1/vgg.pth
...[net-lin [vgg]] initialized
...Done
Traceback (most recent call last):
File "/data/yshan/GAN2Shape/run.py", line 31, in
trainer = Trainer(cfgs, GAN2Shape)
File "/data/yshan/GAN2Shape/gan2shape/trainer.py", line 23, in init
self.model = model(cfgs)
File "/data/yshan/GAN2Shape/gan2shape/model.py", line 92, in init
self.renderer = Renderer(cfgs, self.image_size)
File "/data/yshan/GAN2Shape/gan2shape/renderer/renderer.py", line 44, in init
self.inv_K_origin = torch.inverse(K).unsqueeze(0)
RuntimeError: Error in dlopen: libtorch_cuda_linalg.so: cannot open shared object file: No such file or directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 209663) of binary: /data/yshan/anaconda3/bin/python
Traceback (most recent call last):
File "/data/yshan/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/data/yshan/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data/yshan/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 195, in
main()
File "/data/yshan/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 191, in main
launch(args)
File "/data/yshan/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 176, in launch
run(args)
File "/data/yshan/anaconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/data/yshan/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/data/yshan/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
run.py FAILED
Failures: <NO_OTHER_FAILURES>
Root Cause (first observed failure): [0]: time : 2023-03-10_00:31:49 host : saturn12.ihpc.uts.edu.au rank : 0 (local_rank: 0) exitcode : 1 (pid: 209663) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Hi, I got this error, when use you config : EXP=car CONFIG=car GPUS=1 PORT=${PORT:-29577}
mkdir -p results/${EXP}
CUDA_VISIBLE_DEVICES=0
python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT
run.py
--launcher pytorch
--config configs/${CONFIG}.yml
2>&1 | tee results/${EXP}/log.txt