noah-research icon indicating copy to clipboard operation
noah-research copied to clipboard

[CLIFF] Question about training CLIFF

Open MooreManor opened this issue 1 year ago • 10 comments

@zhihaolee I tried training CLIFF res50 from scratch.

Here is my exp setting. I used 4 GPUS with syncbatchnorm, and the batch size for per GPU is 64. The dataset setting for training CLIFF on Human3.6M, COCO (your pseudo GT), MPII (your pseudo GT), MPI-INF-3DHP, and 3DPW-train set with partitions 0.4, 0.3, 0.3, 0.1, 0.2 respectively. I didn't use the PARE syncocclusion for augmentation. I used lr 1xe-4 and didn't reduce it in the middle. I used the pretrained res50 weight on COCO instead of Imagenet. The input image size is 256x192.

The paper mentions that the learning rate is set to 1 ×e−4 and reduced by a factor of 10 in the middle.

The time when lr reduces is set to be in the middle (i.e. the 100th epoch). However, I found at about 12th epoch during training, the result was 74.9 which is close to MPJPE 72.0 in your paper. I ran your checkpoint to eval on 3dpw, and got about 73.1 MPJPE on my computer. I think the result is close but with training only 12 epochs. The training epoch num should be 200, but only 12 epochs of training got a similar result. image

  • According to the evaluation performance, do I have to reduce the lr rate until the 100th epoch? Maybe at the 10th epoch?
  • Is the convergence speed of my experiment similar to yours?

Here is the training log. Did I make something wrong? image

MooreManor avatar Nov 21 '22 15:11 MooreManor