first-order-model icon indicating copy to clipboard operation
first-order-model copied to clipboard

Retraining 512x512 with 68 keypoints on Google Cloud Free Trial

Open ghost opened this issue 3 years ago • 11 comments

I want to retrain FOMM with 68 keypoints and release it publicly, if successful. I haven't used Google Cloud yet or trained FOMM, but if it's feasible, I'm going to research more about it. I was wondering whether it's possible to retrain 512x512 with 68 keypoints on Google Cloud with their 300$ trial period? Would it be enough, or would the training take too long? (it took 5 days for this person to train 512x512 on 3090 with 10 keypoints: https://github.com/AliaksandrSiarohin/first-order-model/issues/20#issuecomment-819614134) For example using two Tesla T4 would cost 0.76$ per hour, but I guess there are some additional costs for other hardware.

ghost avatar Jun 04 '21 10:06 ghost

would love to help with this, please get back asap

revoconner avatar Sep 16 '21 16:09 revoconner

I'm here. I haven't researched more about this since commenting, but I'm still interested in trying.

ghost avatar Sep 16 '21 17:09 ghost

I have four Tesla V100 fully paid waiting, if you could show me how to train, I would love to share the end results publicly. Mail me at [email protected]

revoconner avatar Sep 16 '21 17:09 revoconner

I also don't have experience with training the models. Maybe @AliaksandrSiarohin or @adeptflax could help or invite someone else who could help.

I've attached the file used for training the 512x512 model. I've changed the num_kp parameter to 68. I need confirmation whether it's enough for training, or some other parameters should be changed. vox-512.yaml.zip

Also I need to know whether the command CUDA_VISIBLE_DEVICES=0,1,2,3 python run.py --config config/vox-512.yaml --device_ids 0,1,2,3 should be run after downloading the VoxCeleb dataset. And where should the downloaded data be placed?

Also what batch size should be chosen for four Tesla V100?

If someone could confirm, help with these, or just explain in their own words, training should be simple.

ghost avatar Sep 16 '21 17:09 ghost

@AliaksandrSiarohin Please explain, really would love your input

revoconner avatar Sep 19 '21 07:09 revoconner

I presume that 68 keypoints means supervised keypoints obtained from face-aligment lib or dlib? You need to modify the code in this case.

AliaksandrSiarohin avatar Sep 19 '21 07:09 AliaksandrSiarohin

@AliaksandrSiarohin what would be a good way to train models for 512x512? I am very new to this, so I dont understand much

revoconner avatar Sep 19 '21 07:09 revoconner

@revoconner would it be possible to test run the GPUs (just to see if training works)? I think we could figure out how to train. The problem is with modifying the code for 68 keypoints. I have no idea what to do and I also don't know much Python. So we need the help of someone who could modify the code.

ghost avatar Sep 22 '21 20:09 ghost

Guys already running the trainer, hoping for the best

revoconner avatar Sep 23 '21 19:09 revoconner

Are you running with 68 keypoints? Or just 512x512? Because a 512x512 model has already been released by adeptflax: https://github.com/AliaksandrSiarohin/first-order-model/issues/20#issuecomment-822862456

ghost avatar Sep 23 '21 19:09 ghost

@revoconner Hey! Did you able to train it on 512x512? Which dataset did you use for this resolution?

mdv3101 avatar Jan 28 '22 07:01 mdv3101