PF-AFN icon indicating copy to clipboard operation
PF-AFN copied to clipboard

Issues while training on google colab

Open atulnagane45 opened this issue 3 years ago • 2 comments

Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in Traceback (most recent call last): torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in File "train_PBAFN_stage1.py", line 30, in Traceback (most recent call last): torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal File "train_PBAFN_stage1.py", line 30, in Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal Killing subprocess 253 Killing subprocess 254 Killing subprocess 255 Killing subprocess 256 Killing subprocess 257 Killing subprocess 258 Killing subprocess 259 Killing subprocess 260 Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 340, in main() File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'train_PBAFN_stage1.py', '--local_rank=7', '--name', 'PBAFN_stage1', '--resize_or_crop', 'None', '--verbose', '--tf_log', '--batchSize', '1', '--num_gpus', '1', '--label_nc', '7', '--launcher', 'pytorch']' returned non-zero exit status 1.

atulnagane45 avatar May 07 '21 06:05 atulnagane45


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


------------ Options ------------------------- Options -------------

PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 PBAFN_gen_checkpoint: NoneisTrain: True

PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None label_nc: 7PFAFN_warp_checkpoint: None

batchSize: 1 lambda_feat: 10.0beta1: 0.5

checkpoints_dir: ./checkpoints continue_train: False launcher: pytorchdata_type: 32

dataroot: dataset/VITON_traindata/ debug: False loadSize: 512display_freq: 100

display_winsize: 512 fineSize: 512 load_pretrain: local_rank: 6 lr: 5e-05------------ Options ------------- gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 5 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4PBAFN_gen_checkpoint: None

PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None n_layers_D: 3PFAFN_warp_checkpoint: None

batchSize: 1 n_local_enhancers: 1beta1: 0.5 name: PBAFN_stage1

checkpoints_dir: ./checkpoints ndf: 64continue_train: False

data_type: 32 dataroot: dataset/VITON_traindata/ netG: globaldebug: False

display_freq: 100 display_winsize: 512 ngf: 64fineSize: 512

gpu_ids: [0] niter: 50input_nc: 3

isTrain: True label_nc: 7 niter_decay: 50lambda_feat: 10.0 launcher: pytorch

loadSize: 512 load_pretrain: niter_fix_global: 0local_rank: 7

lr: 5e-05 no_flip: Falsemax_dataset_size: inf

nThreads: 1 n_blocks_global: 4 no_ganFeat_loss: Falsen_blocks_local: 3

n_downsample_global: 4 n_layers_D: 3 no_html: Falsen_local_enhancers: 1

name: PBAFN_stage1 ndf: 64 no_lsgan: FalsenetG: global

ngf: 64 niter: 50 no_vgg_loss: Falseniter_decay: 50

niter_fix_global: 0 no_flip: False norm: instanceno_ganFeat_loss: False

no_html: False no_lsgan: False num_D: 2no_vgg_loss: False

norm: instance num_D: 2 num_gpus: 1num_gpus: 1

output_nc: 3 phase: train output_nc: 3pool_size: 0

print_freq: 100 resize_or_crop: None phase: trainsave_epoch_freq: 20

save_latest_freq: 1000 serial_batches: False pool_size: 0tf_log: True

tv_weight: 0.1 use_dropout: False print_freq: 100verbose: True

which_epoch: latest resize_or_crop: None-------------- End ----------------

save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ----------------

tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- ------------ Options ------------- ------------ Options ------------- PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 1 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 PBAFN_gen_checkpoint: Noneniter_decay: 50 niter_fix_global: 0

no_flip: False no_ganFeat_loss: False PBAFN_warp_checkpoint: Noneno_html: False

no_lsgan: False no_vgg_loss: False PFAFN_gen_checkpoint: Nonenorm: instance

num_D: 2 PFAFN_warp_checkpoint: Nonenum_gpus: 1

output_nc: 3 phase: train batchSize: 1pool_size: 0

print_freq: 100 resize_or_crop: None beta1: 0.5save_epoch_freq: 20

save_latest_freq: 1000 serial_batches: False checkpoints_dir: ./checkpointstf_log: True

tv_weight: 0.1 continue_train: Falseuse_dropout: False

verbose: True which_epoch: latest data_type: 32-------------- End ----------------

dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 0 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 ------------ Options -------------serial_batches: False

tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch ------------ Options ------------------------- Options -------------

PBAFN_gen_checkpoint: None PBAFN_gen_checkpoint: NonePBAFN_warp_checkpoint: None

PFAFN_gen_checkpoint: None PBAFN_warp_checkpoint: NonePFAFN_warp_checkpoint: None

batchSize: 1 PFAFN_gen_checkpoint: Nonebeta1: 0.5

checkpoints_dir: ./checkpoints PFAFN_warp_checkpoint: Nonecontinue_train: False

data_type: 32 dataroot: dataset/VITON_traindata/ batchSize: 1debug: False

display_freq: 100 display_winsize: 512 beta1: 0.5fineSize: 512

gpu_ids: [0] input_nc: 3 checkpoints_dir: ./checkpointsisTrain: True

label_nc: 7 lambda_feat: 10.0 continue_train: Falselauncher: pytorch

loadSize: 512 load_pretrain: data_type: 32local_rank: 2

lr: 5e-05 dataroot: dataset/VITON_traindata/max_dataset_size: inf

nThreads: 1 n_blocks_global: 4 debug: Falsen_blocks_local: 3

n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 display_freq: 100ndf: 64

netG: global ngf: 64 display_winsize: 512niter: 50

niter_decay: 50 niter_fix_global: 0 fineSize: 512no_flip: False

no_ganFeat_loss: False no_html: False gpu_ids: [0]no_lsgan: False

no_vgg_loss: False input_nc: 3norm: instance

num_D: 2 isTrain: Truenum_gpus: 1

output_nc: 3 phase: train label_nc: 7pool_size: 0

print_freq: 100 resize_or_crop: None lambda_feat: 10.0save_epoch_freq: 20

save_latest_freq: 1000 serial_batches: False launcher: pytorchtf_log: True

tv_weight: 0.1 use_dropout: False loadSize: 512verbose: True

which_epoch: latest load_pretrain: -------------- End ----------------

local_rank: 4 lr: 5e-05 max_dataset_size: infloadSize: 512 load_pretrain: local_rank: 3 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1

name: PBAFN_stage1 ndf: 64nThreads: 1

netG: global ngf: 64 n_blocks_global: 4niter: 50

niter_decay: 50 niter_fix_global: 0 no_flip: False n_blocks_local: 3no_ganFeat_loss: False

no_html: False no_lsgan: False n_downsample_global: 4no_vgg_loss: False

norm: instance num_D: 2 num_gpus: 1n_layers_D: 3 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ----------------

n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- ------------ Options ------------------------- Options ------------------------- Options ------------- PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 6 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ----------------

PBAFN_gen_checkpoint: None PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None

PBAFN_warp_checkpoint: NonePFAFN_gen_checkpoint: None

PFAFN_warp_checkpoint: NonePFAFN_gen_checkpoint: None

batchSize: 1PFAFN_warp_checkpoint: None

beta1: 0.5batchSize: 1

beta1: 0.5checkpoints_dir: ./checkpoints

checkpoints_dir: ./checkpointscontinue_train: False data_type: 32

dataroot: dataset/VITON_traindata/continue_train: False

debug: Falsedata_type: 32 display_freq: 100

dataroot: dataset/VITON_traindata/display_winsize: 512

debug: FalsefineSize: 512

display_freq: 100gpu_ids: [0]

display_winsize: 512input_nc: 3

fineSize: 512isTrain: True

gpu_ids: [0]label_nc: 7

input_nc: 3lambda_feat: 10.0

isTrain: Truelauncher: pytorch

label_nc: 7loadSize: 512

lambda_feat: 10.0load_pretrain:

launcher: pytorchlocal_rank: 1

loadSize: 512lr: 5e-05 load_pretrain:

max_dataset_size: inflocal_rank: 5

nThreads: 1lr: 5e-05 n_blocks_global: 4 max_dataset_size: inf n_blocks_local: 3

nThreads: 1n_downsample_global: 4 n_blocks_global: 4 n_layers_D: 3

n_blocks_local: 3n_local_enhancers: 1 n_downsample_global: 4 name: PBAFN_stage1 n_layers_D: 3 ndf: 64------------ Options -------------

n_local_enhancers: 1 netG: global name: PBAFN_stage1PBAFN_gen_checkpoint: None

ngf: 64

ndf: 64niter: 50

netG: globalniter_decay: 50

ngf: 64niter_fix_global: 0

niter: 50no_flip: False

niter_decay: 50no_ganFeat_loss: False

niter_fix_global: 0no_html: False

no_flip: Falseno_lsgan: False

no_ganFeat_loss: Falseno_vgg_loss: False no_html: False norm: instance no_lsgan: False num_D: 2 no_vgg_loss: False

num_gpus: 1norm: instance

output_nc: 3num_D: 2 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None

num_gpus: 1 output_nc: 3 save_epoch_freq: 20phase: train

save_latest_freq: 1000pool_size: 0

serial_batches: Falseprint_freq: 100

tf_log: Trueresize_or_crop: None

tv_weight: 0.1save_epoch_freq: 20

save_latest_freq: 1000use_dropout: False

serial_batches: Falseverbose: True tf_log: True which_epoch: latest

tv_weight: 0.1-------------- End ---------------- use_dropout: False

verbose: True which_epoch: latest -------------- End ---------------- PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 0 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------------------- Options -------------

PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 4 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- ------------ Options ------------- PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 7 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- ------------ Options ------------- PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 2 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- ------------ Options ------------- PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 3 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- ------------ Options ------------------------- Options ------------- PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None

PFAFN_gen_checkpoint: NonePBAFN_gen_checkpoint: None------------ Options ------------- PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5

checkpoints_dir: ./checkpointsPBAFN_warp_checkpoint: None continue_train: False

PFAFN_gen_checkpoint: Nonedata_type: 32

PFAFN_warp_checkpoint: Nonedataroot: dataset/VITON_traindata/

batchSize: 1debug: False

beta1: 0.5display_freq: 100

checkpoints_dir: ./checkpointsdisplay_winsize: 512

continue_train: FalsefineSize: 512

data_type: 32gpu_ids: [0]

dataroot: dataset/VITON_traindata/input_nc: 3

debug: FalseisTrain: True

display_freq: 100label_nc: 7

display_winsize: 512lambda_feat: 10.0

launcher: pytorchfineSize: 512

gpu_ids: [0]loadSize: 512

load_pretrain: input_nc: 3PBAFN_gen_checkpoint: None

local_rank: 0isTrain: True

label_nc: 7lr: 5e-05 lambda_feat: 10.0

launcher: pytorchmax_dataset_size: inf

loadSize: 512 nThreads: 1 PBAFN_warp_checkpoint: Noneload_pretrain: n_blocks_global: 4

local_rank: 7PFAFN_gen_checkpoint: None n_blocks_local: 3

lr: 5e-05PFAFN_warp_checkpoint: None n_downsample_global: 4

batchSize: 1 max_dataset_size: infn_layers_D: 3 beta1: 0.5

nThreads: 1n_local_enhancers: 1

checkpoints_dir: ./checkpointsn_blocks_global: 4name: PBAFN_stage1

continue_train: Falsen_blocks_local: 3ndf: 64

data_type: 32 n_downsample_global: 4netG: global

dataroot: dataset/VITON_traindata/ n_layers_D: 3ngf: 64

debug: Falsen_local_enhancers: 1 niter: 50

display_freq: 100name: PBAFN_stage1 niter_decay: 50

ndf: 64display_winsize: 512

niter_fix_global: 0netG: globalfineSize: 512

no_flip: Falsengf: 64gpu_ids: [0]

no_ganFeat_loss: False niter: 50input_nc: 3

no_html: False niter_decay: 50isTrain: True

no_lsgan: False niter_fix_global: 0label_nc: 7

no_vgg_loss: False no_flip: Falselambda_feat: 10.0

norm: instanceno_ganFeat_loss: False launcher: pytorch

no_html: Falsenum_D: 2

loadSize: 512 no_lsgan: Falsenum_gpus: 1

load_pretrain: no_vgg_loss: False ------------ Options -------------output_nc: 3

local_rank: 1 phase: train

lr: 5e-05pool_size: 0norm: instance print_freq: 100

num_D: 2max_dataset_size: infresize_or_crop: None

num_gpus: 1nThreads: 1save_epoch_freq: 20

output_nc: 3n_blocks_global: 4save_latest_freq: 1000

phase: trainn_blocks_local: 3serial_batches: False

pool_size: 0n_downsample_global: 4tf_log: True

print_freq: 100n_layers_D: 3tv_weight: 0.1

resize_or_crop: Nonen_local_enhancers: 1 use_dropout: False

save_epoch_freq: 20name: PBAFN_stage1 verbose: True

ndf: 64save_latest_freq: 1000 which_epoch: latest

netG: global serial_batches: False-------------- End ----------------

ngf: 64tf_log: True

niter: 50tv_weight: 0.1

niter_decay: 50use_dropout: False

niter_fix_global: 0verbose: True

no_flip: Falsewhich_epoch: latest

no_ganFeat_loss: False-------------- End ---------------- no_html: False

no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- ------------ Options ------------- PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 6 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100------------ Options ------------- PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None

resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- ------------ Options ------------- PBAFN_gen_checkpoint: NonePFAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None

PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None PFAFN_warp_checkpoint: NonebatchSize: 1

beta1: 0.5 batchSize: 1checkpoints_dir: ./checkpoints

continue_train: False data_type: 32 beta1: 0.5dataroot: dataset/VITON_traindata/

debug: False display_freq: 100 checkpoints_dir: ./checkpointsdisplay_winsize: 512

fineSize: 512 continue_train: Falsegpu_ids: [0]

input_nc: 3 isTrain: True data_type: 32label_nc: 7

lambda_feat: 10.0 launcher: pytorch dataroot: dataset/VITON_traindata/loadSize: 512

load_pretrain: local_rank: 3 debug: False lr: 5e-05 display_freq: 100max_dataset_size: inf

nThreads: 1 n_blocks_global: 4 display_winsize: 512n_blocks_local: 3

n_downsample_global: 4 n_layers_D: 3 fineSize: 512n_local_enhancers: 1

name: PBAFN_stage1 ndf: 64 gpu_ids: [0]netG: global

ngf: 64 niter: 50 input_nc: 3niter_decay: 50

niter_fix_global: 0 no_flip: False isTrain: Trueno_ganFeat_loss: False

no_html: False no_lsgan: False label_nc: 7no_vgg_loss: False

norm: instance num_D: 2 lambda_feat: 10.0num_gpus: 1

output_nc: 3 phase: train launcher: pytorchpool_size: 0

print_freq: 100 resize_or_crop: None loadSize: 512save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False

tf_log: True tv_weight: 0.1 load_pretrain: use_dropout: False

verbose: True which_epoch: latest local_rank: 2-------------- End ----------------

lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 4 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- ------------ Options ------------- PBAFN_gen_checkpoint: None PBAFN_warp_checkpoint: None PFAFN_gen_checkpoint: None PFAFN_warp_checkpoint: None batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: dataset/VITON_traindata/ debug: False display_freq: 100 display_winsize: 512 fineSize: 512 gpu_ids: [0] input_nc: 3 isTrain: True label_nc: 7 lambda_feat: 10.0 launcher: pytorch loadSize: 512 load_pretrain: local_rank: 5 lr: 5e-05 max_dataset_size: inf nThreads: 1 n_blocks_global: 4 n_blocks_local: 3 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: PBAFN_stage1 ndf: 64 netG: global ngf: 64 niter: 50 niter_decay: 50 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 num_gpus: 1 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: None save_epoch_freq: 20 save_latest_freq: 1000 serial_batches: False tf_log: True tv_weight: 0.1 use_dropout: False verbose: True which_epoch: latest -------------- End ---------------- Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in Traceback (most recent call last): torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in File "train_PBAFN_stage1.py", line 30, in Traceback (most recent call last): torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal File "train_PBAFN_stage1.py", line 30, in Traceback (most recent call last): File "train_PBAFN_stage1.py", line 30, in torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch.cuda.set_device(opt.local_rank) File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal Killing subprocess 253 Killing subprocess 254 Killing subprocess 255 Killing subprocess 256 Killing subprocess 257 Killing subprocess 258 Killing subprocess 259 Killing subprocess 260 Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 340, in main() File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/usr/local/lib/python3.7/dist-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'train_PBAFN_stage1.py', '--local_rank=7', '--name', 'PBAFN_stage1', '--resize_or_crop', 'None', '--verbose', '--tf_log', '--batchSize', '1', '--num_gpus', '1', '--label_nc', '7', '--launcher', 'pytorch']' returned non-zero exit status 1.

atulnagane45 avatar May 07 '21 06:05 atulnagane45

@atulnagane45 any resolution for this issue? Was trying with a multi-gpu setup and faced the same issue.

SahilDhull avatar Jun 02 '21 12:06 SahilDhull