Tips for training
Thank you for making this repository.
It might be hard to run custom trainning due to some issues in the repo. Here are tips to make it work:
Env: RTX 3080, Ubuntu 18.04, Python 3.8, Pytorch 1.7.0, CUDA 11, CUDAToolkit 11.0.221, mmvc(torch1.7.0 and cu110), mmgeneration(f6551e1)
1. Issue related to mmgeneration SiLU is registered
Comment line 35(the annotation to register SiLU) in mmgeneration/mmgen/models/architectures/ddpm/modules.py, then install mmgeneration.
2. Build error when compiling custom operators nvcc fatal : Unsupported gpu architecture ‘compute_86‘ on new graphics card series.
Use export TORCH_CUDA_ARCH_LIST="8.0" before running.
3. Pretrained weights is missing.
Download pretrained weights in configs into work_dirs/pre-trained/ before running.
4. Running trainer directly with Python.
Add agilegan folder in tools/train.py,
import sys
sys.path.append(PATH_TO_MMGEN-FaceStylor)
import agilegan # isort:skip # noqa
then use python tools/train.py PATH_TO_YOUR_CONFIGS --work-dir PATH_TO_YOUR_DIR --gpus GPU_NUMS.
5. object has no attribute 'module' in transfer.py and log_buffer boardcasting error
Fix agilegan/transfer.py with,
~line 140 and 210
# obtain some params
# fix
# g_log_size = self.generator.module.log_size
g_log_size = self.generator.log_size
# fix
# if hasattr(self.generator.module, 'num_layers'):
if hasattr(self.generator, 'num_layers'):
# fix
# g_num_layers = self.generator.module.num_layers
g_num_layers = self.generator.num_layers
else:
# fix
# g_num_layers = self.generator.module.num_injected_noises
g_num_layers = self.generator.num_injected_noises
# fix
# d_log_size = self.discriminator.module.log_size
d_log_size = self.discriminator.log_size
~line 460
# update ada p
# fix
# if hasattr(self.discriminator.module,
# 'with_ada') and self.discriminator.module.with_ada:
if hasattr(self.discriminator,
'with_ada') and self.discriminator.with_ada:
# self.discriminator.module.ada_aug.log_buffer[0] += 1
self.discriminator.ada_aug.log_buffer[0] += 1
# self.discriminator.module.ada_aug.log_buffer[
self.discriminator.ada_aug.log_buffer[
# fix
# 1] += disc_pred_real.sign()
1] += disc_pred_real.sign().sum()
# self.discriminator.module.ada_aug.update(iteration=curr_iter,
self.discriminator.ada_aug.update(iteration=curr_iter,
num_batches=batch_size)
log_vars_disc['ada_prob'] = (
# self.discriminator.module.ada_aug.aug_pipeline.p.data)
self.discriminator.ada_aug.aug_pipeline.p.data)
6. Issue related to logger, gpu data should use cpu() before numpy().
Disable logger in your config, you can get info in work directory.
# log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
log_config = dict(interval=100, hooks=[])
Hopes this helps.
Thank you for making this repository.
It might be hard to run custom trainning due to some issues in the repo. Here are tips to make it work:
Env: RTX 3080, Ubuntu 18.04, Python 3.8, Pytorch 1.7.0, CUDA 11, CUDAToolkit 11.0.221, mmvc(torch1.7.0 and cu110), mmgeneration(f6551e1)
1. Issue related to mmgeneration
SiLU is registeredComment line 35(the annotation to register SiLU) in
mmgeneration/mmgen/models/architectures/ddpm/modules.py, then install mmgeneration.2. Build error when compiling custom operators
nvcc fatal : Unsupported gpu architecture ‘compute_86‘on new graphics card series.Use
export TORCH_CUDA_ARCH_LIST="8.0"before running.3. Pretrained weights is missing.
Download pretrained weights in configs into
work_dirs/pre-trained/before running.4. Running trainer directly with Python.
Add
agileganfolder intools/train.py,import sys sys.path.append(PATH_TO_MMGEN-FaceStylor) import agilegan # isort:skip # noqathen use
python tools/train.py PATH_TO_YOUR_CONFIGS --work-dir PATH_TO_YOUR_DIR --gpus GPU_NUMS.5.
object has no attribute 'module'intransfer.pyandlog_buffer boardcasting errorFix
agilegan/transfer.pywith, ~line 140 and 210# obtain some params # fix # g_log_size = self.generator.module.log_size g_log_size = self.generator.log_size # fix # if hasattr(self.generator.module, 'num_layers'): if hasattr(self.generator, 'num_layers'): # fix # g_num_layers = self.generator.module.num_layers g_num_layers = self.generator.num_layers else: # fix # g_num_layers = self.generator.module.num_injected_noises g_num_layers = self.generator.num_injected_noises # fix # d_log_size = self.discriminator.module.log_size d_log_size = self.discriminator.log_size~line 460
# update ada p # fix # if hasattr(self.discriminator.module, # 'with_ada') and self.discriminator.module.with_ada: if hasattr(self.discriminator, 'with_ada') and self.discriminator.with_ada: # self.discriminator.module.ada_aug.log_buffer[0] += 1 self.discriminator.ada_aug.log_buffer[0] += 1 # self.discriminator.module.ada_aug.log_buffer[ self.discriminator.ada_aug.log_buffer[ # fix # 1] += disc_pred_real.sign() 1] += disc_pred_real.sign().sum() # self.discriminator.module.ada_aug.update(iteration=curr_iter, self.discriminator.ada_aug.update(iteration=curr_iter, num_batches=batch_size) log_vars_disc['ada_prob'] = ( # self.discriminator.module.ada_aug.aug_pipeline.p.data) self.discriminator.ada_aug.aug_pipeline.p.data)6. Issue related to logger, gpu data should use cpu() before numpy().
Disable logger in your config, you can get info in work directory.
# log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')]) log_config = dict(interval=100, hooks=[])Hopes this helps.
bro, u almost solved all the problems I met. Respect..
- object has no attribute 'module' in transfer.py and log_buffer boardcasting error
This should be mean not sum
update ada p
# fix
# if hasattr(self.discriminator.module,
# 'with_ada') and self.discriminator.module.with_ada:
if hasattr(self.discriminator,
'with_ada') and self.discriminator.with_ada:
# self.discriminator.module.ada_aug.log_buffer[0] += 1
self.discriminator.ada_aug.log_buffer[0] += 1
# self.discriminator.module.ada_aug.log_buffer[
self.discriminator.ada_aug.log_buffer[
# fix
# 1] += disc_pred_real.sign()
1] += disc_pred_real.sign().**mean()**
# self.discriminator.module.ada_aug.update(iteration=curr_iter,
self.discriminator.ada_aug.update(iteration=curr_iter,
num_batches=batch_size)
log_vars_disc['ada_prob'] = (
# self.discriminator.module.ada_aug.aug_pipeline.p.data)
self.discriminator.ada_aug.aug_pipeline.p.data)
How to show progress while training?