pytorch-CycleGAN-and-pix2pix
pytorch-CycleGAN-and-pix2pix copied to clipboard
Use CycleGAN for pose estimation
Hi, Thank you for the awesome work.
I have a VLP-16 lidar and I want to use it for activity recognition, I want to treat point clouds as RGB images so that I can use openpose or other pose estimation neural network. First, I convert my point clouds into depth images and the following are some examples:
I hope CycleGAN can translate these depth images (with specific action) into corresponding RGB images; namely, if my depth images consists of walk, kick, writing, climbing, etc., I hope CycleGAN can not only achieve image-to-image translation but also translate to correct action. Is it possible?
The link are my current results and the following video is input:
https://user-images.githubusercontent.com/49118957/156760287-cd8af0cf-06e7-41e7-9a13-477c365d3c09.mp4
Now my depth images only contain front view of walking and there is only one person in the scene. After 30 epochs training (I only have Colab for training), CycleGAN outputs the above video. I use this command for training: python train.py --dataroot ./datasets/yivlp2yihuman --name yivlp2rgbhuman --model cycle_gan --n_epochs 15 --n_epochs_decay 15
I have some questions to ask:
- As mentioned above, if it is possible that CycleGAN can translate to the corresponding actions?
- I have processed my depth images with 256x256 pixels, do I need to use crop and resize? I know this issue and tips.md, but in my case I'm not sure if these factors will be beneficial for training. And if not use cropping or resizing, just setup
--preprocess none
in command line? - How much data should be enough? I noticed that different datasets have data volumes ranging from 400 to 137K.
May i have your suggestions? Any help is much appreciated:)
Hello,
- it sounds like a reasonable task. But do you have paired data in your dataset? If you have the ground-truth RGB for the input images, you can use pix2pix instead of CycleGAN.
- Yes.
--preprocess none
will disable cropping. Cropping can be beneficial when you don't have enough samples in your dataset, to prevent overfitting. - This depends on your task and the complexity of the data, so I can't say for sure, but you probably need more than a few thousand images, or hopefully 50k+ images.
If the goal is to pass the output images to pose estimation networks, perhaps you can increase the weight on the L1 loss. It will generate less diverse images but will be more faithful to the input data and with less artifacts.
Hi, @taesungp, thank you very much for your reply:)
Yes, I have paired data. But my depth images are distorted, they may not have the accurate correspondence to the same locations with RGB images. This is why I use CycleGAN. Depend on my task. Is pix2pix better than CycleGAN?
In pix2pix_model.py, I found:
parser.add_argument('--lambda_L1', type=float, default=100.0, help='weight for L1 loss')
And CycleGAN (cycle_gan_model.py) has:
lambda_A, lambda_B, default=10.0
lambda_identity, default=0.5
But I don't know how much to increase these weight, this issue increase lambda weight from 10 to 20. Is there any rule to follow? In other words, I'm wondering the reasonable range of loss.
In addition, I can only train for 100 epochs at a time on colab, then I use --continue_train
for the next cycle. Compared to training in one go, I'm not sure if my training configuration properly (learning rate decay problem).
May i have your suggestions? Any help is much appreciated!
Hi, @taesungp, the following are my recently experiment results:
My datasets have 9411 depth images, 9081 RGB images, all images are 256x256, and contain three actions: forward/backward, wave hands, and forward bend.
I use this command to train CycleGAN on Colab: !python train.py --dataroot ./datasets/yicyclepix_0322 --name yivlp2rgbhuman --model cycle_gan --n_epochs 100 --n_epochs_decay 100 --epoch_count 180 --continue_train --lambda_A 25 --lambda_B 25 --batch_size 3 --preprocess crop --load_size 256 --crop_size 224 --display_id -1
Training options:
----------------- Options ---------------
batch_size: 3 [default: 1]
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: True [default: False]
crop_size: 224 [default: 256]
dataroot: ./datasets/yicyclepix_0322 [default: None]
dataset_mode: unaligned
direction: AtoB
display_env: main
display_freq: 400
display_id: -1 [default: 1]
display_ncols: 4
display_port: 8097
display_server:[ http://localhost](http://localhost/)
display_winsize: 256
epoch: latest
epoch_count: 180 [default: 1]
gan_mode: lsgan
gpu_ids: 0
init_gain: 0.02
init_type: normal
input_nc: 3
isTrain: True [default: None]
lambda_A: 25.0 [default: 10.0]
lambda_B: 25.0 [default: 10.0]
lambda_identity: 0.5
load_iter: 0 [default: 0]
load_size: 256 [default: 286]
lr: 0.0002
lr_decay_iters: 50
lr_policy: linear
max_dataset_size: inf
model: cycle_gan
n_epochs: 100
n_epochs_decay: 100
n_layers_D: 3
name: yivlp2rgbhuman [default: experiment_name]
ndf: 64
netD: basic
netG: resnet_9blocks
ngf: 64
no_dropout: True
no_flip: False
no_html: False
norm: instance
num_threads: 4
output_nc: 3
phase: train
pool_size: 50
preprocess: crop [default: resize_and_crop]
print_freq: 100
save_by_iter: False
save_epoch_freq: 5
save_latest_freq: 5000
serial_batches: False
suffix:
update_html_freq: 1000
use_wandb: False
verbose: False
----------------- End -------------------
After 200 epochs of training, I plot losses:
And these videos are test results:
For forward/backward: Input, Output.
For wave hands: Input, Output.
For forward bend: Input, Output.
I add these flags when test my CycleGAN model: --batch_size 3 --preprocess crop --load_size 256 --crop_size 224 --no_dropout As these videos show, the result is not good. Could you give me some suggestions?
Besides, I try another training options: I change --load_size, --crop_size 256, and add --netG: !python train.py --dataroot ./datasets/yicyclepix_0322 --name yivlp2rgbhuman --model cycle_gan --n_epochs 100 --n_epochs_decay 100 --epoch_count 89 --continue_train --lambda_A 25 --lambda_B 25 --batch_size 3 --netG resnet_6blocks --preprocess crop --load_size 286 --crop_size 256 --display_id -1
Training options:
----------------- Options ---------------
batch_size: 3 [default: 1]
beta1: 0.5
checkpoints_dir: ./checkpoints
continue_train: True [default: False]
crop_size: 256
dataroot: ./datasets/yicyclepix_0322 [default: None]
dataset_mode: unaligned
direction: AtoB
display_env: main
display_freq: 400
display_id: -1 [default: 1]
display_ncols: 4
display_port: 8097
display_server:[ http://localhost](http://localhost/)
display_winsize: 256
epoch: latest
epoch_count: 89 [default: 1]
gan_mode: lsgan
gpu_ids: 0
init_gain: 0.02
init_type: normal
input_nc: 3
isTrain: True [default: None]
lambda_A: 25.0 [default: 10.0]
lambda_B: 25.0 [default: 10.0]
lambda_identity: 0.5
load_iter: 0 [default: 0]
load_size: 286
lr: 0.0002
lr_decay_iters: 50
lr_policy: linear
max_dataset_size: inf
model: cycle_gan
n_epochs: 100
n_epochs_decay: 100
n_layers_D: 3
name: yivlp2rgbhuman [default: experiment_name]
ndf: 64
netD: basic
netG: resnet_6blocks [default: resnet_9blocks]
ngf: 64
no_dropout: True
no_flip: False
no_html: False
norm: instance
num_threads: 4
output_nc: 3
phase: train
pool_size: 50
preprocess: crop [default: resize_and_crop]
print_freq: 100
save_by_iter: False
save_epoch_freq: 5
save_latest_freq: 5000
serial_batches: False
suffix:
update_html_freq: 1000
use_wandb: False
verbose: False
----------------- End -------------------
Although now I just train CycleGAN to 133 epochs, images generated during the training process let me know that the learning effect may still be bad:
May i have your suggestions? Any help is much appreciated:)