DeepMosaics icon indicating copy to clipboard operation
DeepMosaics copied to clipboard

Training with your own dataset not work

Open ginpigin opened this issue 3 years ago • 37 comments

When creating a dataset, I get empty folders, although I added original images and a mask to the required directories according to the instructions when using for example python make_pix2pix_dataset.py --datadir ../datasets/draw/face --hd --outsize 512 --fold 1 --name face --savedir ../datasets/pix2pix/face --mod drawn - minsize 128 --square " turns out: ../datasets/pix2pix/face existed ../datasets/pix2pix/face\train_A existed ../datasets/pix2pix/face\train_B existed segment parameters: 12.4M Find images: 1 it looks like processing is going on, but the folders remain empty. Only the" opt "file and empty" train_a "and" train_b "folders appear

(deep) PS E:\DeepMosaics\train\clean> python train.py --dataset ../../datasets/video/face --savename face --n_blocks 4 --lambda_GAN 0.01 --loadsize 286 --finesize 256 --batchsize 16 --n_layers_D 2 --num_D 3 --n_epoch 200 --gpu_id 4,5,6,7 --load_thread 8 checkpoints\face existed Please run "tensorboard --logdir checkpoints/tensorboardX --host=your_server_ip" and input "2021-06-12_23-06-24" to filter outputs checkpoints\face existed Please run "tensorboard --logdir checkpoints/tensorboardX --host=your_server_ip" and input "2021-06-12_23-06-26" to filter outputs Traceback (most recent call last): File "", line 1, in File "C:\ProgramData\Anaconda31\envs\deep\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\ProgramData\Anaconda31\envs\deep\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\ProgramData\Anaconda31\envs\deep\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\ProgramData\Anaconda31\envs\deep\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\ProgramData\Anaconda31\envs\deep\lib\runpy.py", line 268, in run_path return _run_module_code(code, init_globals, run_name, File "C:\ProgramData\Anaconda31\envs\deep\lib\runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\ProgramData\Anaconda31\envs\deep\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "E:\DeepMosaics\train\clean\train.py", line 117, in Videodataloader_train = dataloader.VideoDataLoader(opt, videolist_train) File "E:\DeepMosaics\train\clean../..\util\dataloader.py", line 115, in init self.load_init() File "E:\DeepMosaics\train\clean../..\util\dataloader.py", line 138, in load_init p.start() File "C:\ProgramData\Anaconda31\envs\deep\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\ProgramData\Anaconda31\envs\deep\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\ProgramData\Anaconda31\envs\deep\lib\multiprocessing\context.py", line 327, in _Popen return Popen(process_obj) File "C:\ProgramData\Anaconda31\envs\deep\lib\multiprocessing\popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\ProgramData\Anaconda31\envs\deep\lib\multiprocessing\spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "C:\ProgramData\Anaconda31\envs\deep\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

ginpigin avatar Jun 12 '21 13:06 ginpigin

When creating dataset, it only "Find images: 1". And if this image don't fit the filter, the output will be empty. You can try more images. And the next problem: multiprocessing in python is difficult to work on Windows, you can try to change the number of processes to 1 or run it on linux.

HypoX64 avatar Jun 16 '21 11:06 HypoX64

the number of images does not affect the result.

ginpigin avatar Jun 23 '21 18:06 ginpigin

same issue while the created model, work fine if i use it to add mosaic. (on Linux).

It always stop at 2 image to the end with

Work for video.

what's the filter setting ?

ethanfel avatar Jun 24 '21 23:06 ethanfel

same issue while the created model, work fine if i use it to add mosaic. (on Linux).

It always stop at 2 image to the end with

Work for video.

what's the filter setting ?

no dataset for pix2pix addmosaic is created, empty folders are created command "python make_pix2pix_dataset.py --datadir ../datasets/draw/face --hd --outsize 512 --fold 1 --name face --savedir ../datasets/pix2pix/face --mod drawn --minsize 128 --square " training the model when using a video dataset does not start, the command "python make_video_dataset.py --model_path ../pretrained_models/mosaic/add_face.pth --gpu_id 0 --datadir 'dir for your videos' --savedir ../datasets/video / face " gives out: checkpoints \ face existed Please run "tensorboard --logdir checkpoints / tensorboardX --host = your_server_ip" and input "2021-06-12_23-06-24" to filter outputs checkpoints \ face existed Please run "tensorboard --logdir checkpoints / tensorboardX --host = your_server_ip" and input "2021-06-12_23-06-26" to filter outputs Traceback (most recent call last): File "", line 1, in File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ multiprocessing \ spawn.py", line 116, in spawn_main exitcode = _main (fd, parent_sentinel) File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ multiprocessing \ spawn.py", line 125, in _main prepare (preparation_data) File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ multiprocessing \ spawn.py", line 236, in prepare _fixup_main_from_path (data ['init_main_from_path']) File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ multiprocessing \ spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path (main_path, File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ runpy.py", line 268, in run_path return _run_module_code (code, init_globals, run_name, File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ runpy.py", line 97, in _run_module_code _run_code (code, mod_globals, init_globals, File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ runpy.py", line 87, in _run_code exec (code, run_globals) File "E: \ DeepMosaics \ train \ clean \ train.py", line 117, in Videodataloader_train = dataloader.VideoDataLoader (opt, videolist_train) File "E: \ DeepMosaics \ train \ clean ../ .. \ util \ dataloader.py", line 115, in init self.load_init () File "E: \ DeepMosaics \ train \ clean ../ .. \ util \ dataloader.py", line 138, in load_init p.start () File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ multiprocessing \ process.py", line 121, in start self._popen = self._Popen (self) File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ multiprocessing \ context.py", line 224, in _Popen return _default_context.get_context (). Process._Popen (process_obj) File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ multiprocessing \ context.py", line 327, in _Popen return Popen (process_obj) File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ multiprocessing \ popen_spawn_win32.py", line 45, in init prep_data = spawn.get_preparation_data (process_obj._name) File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ multiprocessing \ spawn.py", line 154, in get_preparation_data _check_not_importing_main () File "C: \ ProgramData \ Anaconda31 \ envs \ deep \ lib \ multiprocessing \ spawn.py", line 134, in _check_not_importing_main raise RuntimeError ('' ' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

    if __name__ == '__main__':
        freeze_support ()
        ...

The "freeze_support ()" line can be omitted if the program
is not going to be frozen to produce an executable. 

ginpigin avatar Jun 25 '21 12:06 ginpigin

i've been able to make the database myself. You put the masks in the A folder, and the source image in the B folder. It work.

To use my mosaic model i plan on making a video consisting of all my sources files with the png or jpg and making the database/training using it in the meantime.

i would like to be able to conform my sources to the filter but i'm lacking data

ethanfel avatar Jun 25 '21 13:06 ethanfel

i've been able to make the database myself. You put the masks in the A folder, and the source image in the B folder. It work.

To use my mosaic model i plan on making a video consisting of all my sources files with the png or jpg and making the database/training using it in the meantime.

i would like to be able to conform my sources to the filter but i'm lacking data

on your advice, the addmosaic workout starts. but the clear mosaic still doesn't work. with the video dataset it turns out: (deep2) PS E:\DeepMosaics\train\clean> python train.py --dataset ../../datasets/video/face --savename face --n_blocks 4 --lambda_GAN 0.01 --loadsize 286 --finesize 256 --batchsize 16 --n_layers_D 2 --num_D 3 --n_epoch 200 --gpu_id 0 --load_thread 1 checkpoints\face existed Please run "tensorboard --logdir checkpoints/tensorboardX --host=your_server_ip" and input "2021-06-27_04-39-25" to filter outputs checkpoints\face existed Please run "tensorboard --logdir checkpoints/tensorboardX --host=your_server_ip" and input "2021-06-27_04-39-29" to filter outputs Traceback (most recent call last): File "", line 1, in File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="mp_main") File "C:\ProgramData\Anaconda31\envs\deep2\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "C:\ProgramData\Anaconda31\envs\deep2\lib\runpy.py", line 96, in _run_module_code Traceback (most recent call last): mod_name, mod_spec, pkg_name, script_name) File "train.py", line 117, in File "C:\ProgramData\Anaconda31\envs\deep2\lib\runpy.py", line 85, in _run_code Videodataloader_train = dataloader.VideoDataLoader(opt, videolist_train) File "../..\util\dataloader.py", line 115, in init exec(code, run_globals) self.load_init() File "E:\DeepMosaics\train\clean\train.py", line 117, in File "../..\util\dataloader.py", line 138, in load_init Videodataloader_train = dataloader.VideoDataLoader(opt, videolist_train) p.start() File "../..\util\dataloader.py", line 115, in init File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\process.py", line 112, in start self.load_init() File "../..\util\dataloader.py", line 138, in load_init self._popen = self._Popen(self) p.start() File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\context.py", line 223, in _Popen File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\process.py", line 112, in start return _default_context.get_context().Process._Popen(process_obj) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\context.py", line 322, in _Popen self._popen = self._Popen(self) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\context.py", line 223, in _Popen return Popen(process_obj) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\popen_spawn_win32.py", line 89, in init return _default_context.get_context().Process._Popen(process_obj) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\context.py", line 322, in _Popen reduction.dump(process_obj, to_child) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\reduction.py", line 60, in dump return Popen(process_obj) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\popen_spawn_win32.py", line 46, in init ForkingPickler(file, protocol).dump(obj) BrokenPipeError: [Errno 32] Broken pipe prep_data = spawn.get_preparation_data(process_obj._name) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 143, in get_preparation_data _check_not_importing_main() File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main is not going to be frozen to produce an executable.''') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

ginpigin avatar Jun 26 '21 21:06 ginpigin

For image datasets pix2pix doesn't work either. an empty checkpoint / web directory is created: (deep2) PS E:\DeepMosaics\pix2pixHD> python train.py --name face --resize_or_crop resize_and_crop --loadSize 563 --fineSize 512 --label_nc 0 --no_instance --dataroot ../datasets/pix2pix/face ------------ Options ------------- batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: ../datasets/pix2pix/face debug: False display_freq: 100 display_winsize: 512 feat_num: 3 fineSize: 512 fp16: False gpu_ids: [0] input_nc: 3 instance_feat: False isTrain: True label_feat: False label_nc: 0 lambda_feat: 10.0 loadSize: 563 load_features: False load_pretrain: local_rank: 0 lr: 0.0002 max_dataset_size: inf model: pix2pixHD nThreads: 2 n_blocks_global: 9 n_blocks_local: 3 n_clusters: 10 n_downsample_E: 4 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: face ndf: 64 nef: 16 netG: global ngf: 64 niter: 100 niter_decay: 100 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_instance: True no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: resize_and_crop save_epoch_freq: 10 save_latest_freq: 1000 serial_batches: False tf_log: False use_dropout: False verbose: False which_epoch: latest -------------- End ---------------- train.py:9: DeprecationWarning: fractions.gcd() is deprecated. Use math.gcd() instead. def lcm(a,b): return abs(a * b)/fractions.gcd(a,b) if a and b else 0 CustomDatasetDataLoader dataset [AlignedDataset] was created #training images = 1260 GlobalGenerator( (model): Sequential( (0): ReflectionPad2d((3, 3, 3, 3)) (1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1)) (2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (6): ReLU(inplace=True) (7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (9): ReLU(inplace=True) (10): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (11): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (12): ReLU(inplace=True) (13): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (14): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (15): ReLU(inplace=True) (16): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (17): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (18): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (19): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (20): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (21): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (22): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (23): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (24): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (25): ConvTranspose2d(1024, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (26): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (27): ReLU(inplace=True) (28): ConvTranspose2d(512, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (29): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (30): ReLU(inplace=True) (31): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (32): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (33): ReLU(inplace=True) (34): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (35): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (36): ReLU(inplace=True) (37): ReflectionPad2d((3, 3, 3, 3)) (38): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1)) (39): Tanh() ) ) MultiscaleDiscriminator( (scale0_layer0): Sequential( (0): Conv2d(6, 64, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale0_layer1): Sequential( (0): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale0_layer2): Sequential( (0): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale0_layer3): Sequential( (0): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(2, 2)) (1): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale0_layer4): Sequential( (0): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(2, 2)) ) (scale1_layer0): Sequential( (0): Conv2d(6, 64, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale1_layer1): Sequential( (0): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale1_layer2): Sequential( (0): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale1_layer3): Sequential( (0): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(2, 2)) (1): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale1_layer4): Sequential( (0): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(2, 2)) ) (downsample): AvgPool2d(kernel_size=3, stride=2, padding=[1, 1]) ) create web directory ./checkpoints\face\web... ------------ Options ------------- batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints continue_train: False data_type: 32 dataroot: ../datasets/pix2pix/face debug: False display_freq: 100 display_winsize: 512 feat_num: 3 fineSize: 512 fp16: False gpu_ids: [0] input_nc: 3 instance_feat: False isTrain: True label_feat: False label_nc: 0 lambda_feat: 10.0 loadSize: 563 load_features: False load_pretrain: local_rank: 0 lr: 0.0002 max_dataset_size: inf model: pix2pixHD nThreads: 2 n_blocks_global: 9 n_blocks_local: 3 n_clusters: 10 n_downsample_E: 4 n_downsample_global: 4 n_layers_D: 3 n_local_enhancers: 1 name: face ndf: 64 nef: 16 netG: global ngf: 64 niter: 100 niter_decay: 100 niter_fix_global: 0 no_flip: False no_ganFeat_loss: False no_html: False no_instance: True no_lsgan: False no_vgg_loss: False norm: instance num_D: 2 output_nc: 3 phase: train pool_size: 0 print_freq: 100 resize_or_crop: resize_and_crop save_epoch_freq: 10 save_latest_freq: 1000 serial_batches: False tf_log: False use_dropout: False verbose: False which_epoch: latest -------------- End ---------------- CustomDatasetDataLoader dataset [AlignedDataset] was created #training images = 1260 GlobalGenerator( (model): Sequential( (0): ReflectionPad2d((3, 3, 3, 3)) (1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1)) (2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (6): ReLU(inplace=True) (7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (9): ReLU(inplace=True) (10): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (11): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (12): ReLU(inplace=True) (13): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) (14): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (15): ReLU(inplace=True) (16): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (17): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (18): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (19): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (20): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (21): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (22): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (23): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (24): ResnetBlock( (conv_block): Sequential( (0): ReflectionPad2d((1, 1, 1, 1)) (1): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (2): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (3): ReLU(inplace=True) (4): ReflectionPad2d((1, 1, 1, 1)) (5): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1)) (6): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) ) ) (25): ConvTranspose2d(1024, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (26): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (27): ReLU(inplace=True) (28): ConvTranspose2d(512, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (29): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (30): ReLU(inplace=True) (31): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (32): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (33): ReLU(inplace=True) (34): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1)) (35): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (36): ReLU(inplace=True) (37): ReflectionPad2d((3, 3, 3, 3)) (38): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1)) (39): Tanh() ) ) MultiscaleDiscriminator( (scale0_layer0): Sequential( (0): Conv2d(6, 64, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale0_layer1): Sequential( (0): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale0_layer2): Sequential( (0): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale0_layer3): Sequential( (0): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(2, 2)) (1): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale0_layer4): Sequential( (0): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(2, 2)) ) (scale1_layer0): Sequential( (0): Conv2d(6, 64, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale1_layer1): Sequential( (0): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale1_layer2): Sequential( (0): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2)) (1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale1_layer3): Sequential( (0): Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(2, 2)) (1): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False) (2): LeakyReLU(negative_slope=0.2, inplace=True) ) (scale1_layer4): Sequential( (0): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(2, 2)) ) (downsample): AvgPool2d(kernel_size=3, stride=2, padding=[1, 1]) ) create web directory ./checkpoints\face\web... Traceback (most recent call last): Traceback (most recent call last): File "", line 1, in File "train.py", line 60, in File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 105, in spawn_main for i, data in enumerate(dataset, start=epoch_iter): File "C:\ProgramData\Anaconda31\envs\deep2\lib\site-packages\torch\utils\data\dataloader.py", line 355, in iter exitcode = _main(fd) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 114, in _main return self._get_iterator() File "C:\ProgramData\Anaconda31\envs\deep2\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _get_iterator prepare(preparation_data) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 225, in prepare return _MultiProcessingDataLoaderIter(self) File "C:\ProgramData\Anaconda31\envs\deep2\lib\site-packages\torch\utils\data\dataloader.py", line 914, in init _fixup_main_from_path(data['init_main_from_path']) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path w.start() File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\process.py", line 112, in start run_name="mp_main") File "C:\ProgramData\Anaconda31\envs\deep2\lib\runpy.py", line 263, in run_path self._popen = self._Popen(self) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\context.py", line 223, in _Popen pkg_name=pkg_name, script_name=fname) File "C:\ProgramData\Anaconda31\envs\deep2\lib\runpy.py", line 96, in _run_module_code return _default_context.get_context().Process._Popen(process_obj) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\context.py", line 322, in _Popen mod_name, mod_spec, pkg_name, script_name) File "C:\ProgramData\Anaconda31\envs\deep2\lib\runpy.py", line 85, in _run_code return Popen(process_obj) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\popen_spawn_win32.py", line 89, in init exec(code, run_globals) File "E:\DeepMosaics\pix2pixHD\train.py", line 60, in reduction.dump(process_obj, to_child) for i, data in enumerate(dataset, start=epoch_iter): File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\reduction.py", line 60, in dump File "C:\ProgramData\Anaconda31\envs\deep2\lib\site-packages\torch\utils\data\dataloader.py", line 355, in iter ForkingPickler(file, protocol).dump(obj) BrokenPipeError: [Errno 32] Broken pipe return self._get_iterator() File "C:\ProgramData\Anaconda31\envs\deep2\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\ProgramData\Anaconda31\envs\deep2\lib\site-packages\torch\utils\data\dataloader.py", line 914, in init w.start() File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\process.py", line 112, in start self._popen = self._Popen(self) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\popen_spawn_win32.py", line 46, in init prep_data = spawn.get_preparation_data(process_obj._name) File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 143, in get_preparation_data _check_not_importing_main() File "C:\ProgramData\Anaconda31\envs\deep2\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main is not going to be frozen to produce an executable.''') RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

ginpigin avatar Jun 26 '21 21:06 ginpigin

i've just tested on Windows with the same environnement as my Linux and, i've got the exact same error as you.

ethanfel avatar Jun 27 '21 08:06 ethanfel

i've just tested on Windows with the same environnement as my Linux and, i've got the exact same error as you.

On ubuntu, the video cleaning training also does not work, an empty web folder is created. Pix2pixрhd cleaning training works. creating clean mosaic dataset a drawn mask to make pix2pix(HD) datasets also does not work.

ginpigin avatar Jun 28 '21 07:06 ginpigin

i've just tested on Windows with the same environnement as my Linux and, i've got the exact same error as you.

On ubuntu, the video cleaning training also does not work, an empty web folder is created. Pix2pixрhd cleaning training works. creating clean mosaic dataset a drawn mask to make pix2pix(HD) datasets also does not work.

i've finished my initials add model and made some script to sort out the trash from creating the video dataset.

I just started training with clean video and it's working.

i've modified the environnement a lot though to make it work with my RTX3090 and added some QOL to the base code, i've documented everything in my fork if that's help you.

ethanfel avatar Jun 29 '21 17:06 ethanfel

i've finished my initials add model and made some script to sort out the trash from creating the video dataset.

I just started training with clean video and it's working.

i've modified the environnement a lot though to make it work with my RTX3090 and added some QOL to the base code, i've documented everything in my fork if that's help you.

Thank you. i will check if it works on my rtx 2080

ginpigin avatar Jun 29 '21 19:06 ginpigin

video cleaning training also does not work, processing also does not start (deep1) goger @ goger-System-Product-Name: ~ / DeepMosaicsRT / train / clean $ python train.py --dataset ../../datasets/video/face --savename face --n_blocks 4 --lambda_GAN 0.01 --loadsize 286 --finesize 256 --batchsize 16 --n_layers_D 2 --num_D 3 --n_epoch 200 --gpu_id 4,5,6,7 --load_thread 16 makedir: checkpoints / face Please run "tensorboard --logdir checkpoints / tensorboard --host = your_server_ip" and input "2021-06-30_03-53-11" to filter outputs (deep1) goger @ goger-System-Product-Name: ~ / DeepMosaicsRT / train / clean $ nothing has changed, everything remains the same, only the file "events.out.tfevents.1624999991.goger-System-Product-Name"

ginpigin avatar Jun 29 '21 20:06 ginpigin

video cleaning training also does not work, processing also does not start (deep1) goger @ goger-System-Product-Name: ~ / DeepMosaicsRT / train / clean $ python train.py --dataset ../../datasets/video/face --savename face --n_blocks 4 --lambda_GAN 0.01 --loadsize 286 --finesize 256 --batchsize 16 --n_layers_D 2 --num_D 3 --n_epoch 200 --gpu_id 4,5,6,7 --load_thread 16 makedir: checkpoints / face Please run "tensorboard --logdir checkpoints / tensorboard --host = your_server_ip" and input "2021-06-30_03-53-11" to filter outputs (deep1) goger @ goger-System-Product-Name: ~ / DeepMosaicsRT / train / clean $ nothing has changed, everything remains the same, only 1 file "events.out.tfevents.1624999991.goger-System-Product-Name"

ginpigin avatar Jun 29 '21 21:06 ginpigin

reduce the number of thread, set gpu_id to 0 if you have only one GPU.

My VM only have 16GB of ram and with 6thread it's saturated and it didn't work with 16 too. Check your video card memory too with nvidia SMI.

ethanfel avatar Jun 30 '21 06:06 ethanfel

python train.py --dataset ../../datasets/video/face --savename face --n_blocks 4 --lambda_GAN 0.01 --loadsize 286 --finesize 256 --batchsize 16 --n_layers_D 2 --num_D 3 --n_epoch 200 --gpu_id 0 --load_thread 1 and nothing has changed

ginpigin avatar Jun 30 '21 11:06 ginpigin

@ethanfel
A good job! I haven't used the code that generates dataset for a long time and it may have some bugs. I look at your code and some changes are very effective. I will cheek this part in my code.

HypoX64 avatar Jun 30 '21 13:06 HypoX64

python train.py --dataset ../../datasets/video/face --savename face --n_blocks 4 --lambda_GAN 0.01 --loadsize 286 --finesize 256 --batchsize 16 --n_layers_D 2 --num_D 3 --n_epoch 200 --gpu_id 0 --load_thread 1 and nothing has changed

Check your vram. Video training take more than 16GB when i’m using it so it may be the reason.

ethanfel avatar Jun 30 '21 13:06 ethanfel

@ginpigin I will fix my code and show how to determine some parameters. And whenever you see "freeze_support()" error, it mean you have to make sure ”--load_thread 1“ or run on linux.

HypoX64 avatar Jun 30 '21 13:06 HypoX64

python train.py --dataset ../../datasets/video/face --savename face --n_blocks 4 --lambda_GAN 0.01 --loadsize 286 --finesize 256 --batchsize 16 --n_layers_D 2 --num_D 3 --n_epoch 200 --gpu_id 0 --load_thread 1 and nothing has changed

Check your vram. Video training take more than 16GB when i’m using it so it may be the reason.

Yes, I konw. When training it, I have to use 4*RTX2080 and it take about one week ...

HypoX64 avatar Jun 30 '21 13:06 HypoX64

@ethanfel A good job! I haven't used the code that generates dataset for a long time and it may have some bugs. I look at your code and some changes are very effective. I will cheek this part in my code.

Nice :D I think I have documented everything. I will improve the manage script to automatically rename the last useful folder to removes the gaps created by the rm.

ethanfel avatar Jun 30 '21 13:06 ethanfel

python train.py --dataset ../../datasets/video/face --savename face --n_blocks 4 --lambda_GAN 0.01 --loadsize 286 --finesize 256 --batchsize 16 --n_layers_D 2 --num_D 3 --n_epoch 200 --gpu_id 0 --load_thread 1 and nothing has changed

Check your vram. Video training take more than 16GB when i’m using it so it may be the reason.

Yes, I konw. When training it, I have to use 4*RTX2080 and it take about one week ...

Yeah, I think that the vram requirement may mislead people trying to do the video training with one GPU. The network need more than 16GB to even start I think. My other card is a GTX1080 8GB and it doesn’t start there with the same environment.

I’m currently at iter 80.000 on my first video training and the results in tensorboard look awesome. You can be proud of your work it work well.

ethanfel avatar Jun 30 '21 13:06 ethanfel

@ginpigin I will fix my code and show how to determine some parameters. And whenever you see "freeze_support()" error, it mean you have to make sure ”--load_thread 1“ or run on linux.

I am running on ubuntu in an anaconda environment variable. does not issue any errors, it just does not work. an empty folder and file is created events.out.tfevents.1624999991 Of course, I don't know much about this, but will the memory of 4 video cards be summed up? I thought the memory would still be 8GB

ginpigin avatar Jun 30 '21 18:06 ginpigin

@ginpigin I will fix my code and show how to determine some parameters. And whenever you see "freeze_support()" error, it mean you have to make sure ”--load_thread 1“ or run on linux.

I am running on ubuntu in an anaconda environment variable. does not issue any errors, it just does not work. an empty folder and file is created events.out.tfevents.1624999991 Of course, I don't know much about this, but will the memory of 4 video cards be summed up? I thought the memory would still be 8GB

hey after much reading to optimize the use of my RTX, i found this : https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9926-tensor-core-performance-the-ultimate-guide.pdf

And to lower the memory use of the network, lower the size of the batch but keep a multiple of 8, same for load_thread, a multiple of 8.

You also have to tune it to never use swap, it will reduce your iter speed by a lot (/4 for me)

Currently with my RTX3090 i've kept the batch size to 16(2x8) , -load_thread 8 (1x8). It use 21GB of VRAM and 25GB of RAM and the CPU is used at 100%

ethanfel avatar Jul 01 '21 14:07 ethanfel

hey after much reading to optimize the use of my RTX, i found this : https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9926-tensor-core-performance-the-ultimate-guide.pdf

And to lower the memory use of the network, lower the size of the batch but keep a multiple of 8, same for load_thread, a multiple of 8.

You also have to tune it to never use swap, it will reduce your iter speed by a lot (/4 for me)

Currently with my RTX3090 i've increased the batch size to 24 (3x8) , -load_thread 8 (1x8). It use 21GB of VRAM and 25GB of RAM and the CPU is used at 100%

thanks, I'll try, did I understand correctly that I need to put batchsize 8 and load_thread 8 for 8GB?

ginpigin avatar Jul 02 '21 04:07 ginpigin

You have to tune the batch_size to fit your VRAM and the load-thread to fit your RAM.

A smaller batch_size will have a negative impact on the efficiency thought.

ethanfel avatar Jul 02 '21 06:07 ethanfel

You have to tune the batch_size to fit your VRAM and the load-thread to fit your RAM.

A smaller batch_size will have a negative impact on the efficiency thought.

how to?

ginpigin avatar Jul 02 '21 10:07 ginpigin

You have to tune the batch_size to fit your VRAM and the load-thread to fit your RAM. A smaller batch_size will have a negative impact on the efficiency thought.

how to?

it's all in the command line.

python train.py --dataset ../../datasets/video/face --savename face --n_blocks 4 --lambda_GAN 0.01 --loadsize 286 --finesize 256 --batchsize **16** --n_layers_D 2 --num_D 3 --n_epoch 200 --gpu_id 0 --load_thread **1**
and nothing has changed

you use nvidia-smi and htop to look at your vram and ram to tune your parameters

ethanfel avatar Jul 02 '21 11:07 ethanfel

you use nvidia-smi and htop to look at your vram and ram to tune your parameters

    Total                             : 8192 MiB
    Used                              : 6721 MiB
    Free                              : 1471 MiB

Well, I looked. 8 gb. what should I do with this information? I am not particularly versed in commands as well as in programming.

ginpigin avatar Jul 02 '21 15:07 ginpigin

you use nvidia-smi and htop to look at your vram and ram to tune your parameters

    Total                             : 8192 MiB
    Used                              : 6721 MiB
    Free                              : 1471 MiB

Well, I looked. 8 gb. what should I do with this information? I am not particularly versed in commands as well as in programming.

you launch the training and watch the vram/ram, if vram is 100% and the training doesn't start your lower the loadsize, if the ram is 100% and swap increase your lower the load_thread.

ethanfel avatar Jul 02 '21 15:07 ethanfel

you launch the training and watch the vram/ram, if vram is 100% and the training doesn't start your lower the loadsize, if the ram is 100% and swap increase your lower the load_thread.

what parameters to change? batch size and load thread? or what else? within what limits? Is it a multiple of 2 or another number, or does it not matter?

ginpigin avatar Jul 03 '21 07:07 ginpigin