GeneFacePlusPlus icon indicating copy to clipboard operation
GeneFacePlusPlus copied to clipboard

total_loss is too high when training head

Open RayDean opened this issue 1 year ago • 3 comments

when I trained head NERF and training steps reached 250K, the total_loss is too high, nearly 580, and other loss seems normal. partial logs are :

| Validation results@248000: {'total_loss': 582.6377294922, 'mse_loss': 0.0012603372, 'sr_mse_loss': 0.0013412535, 'lpips_loss': 1.0247015435, 'sr_lpips_loss': 1.1453602004, 'sr_lip_lpips_loss': 1.0416638839, 'lambda_ambient': 579.4234008789} 03/06 04:17:08 PM Epoch 00000@248000: saving model to checkpoints/motion2video_nerf/meimei_head/model_ckpt_steps_248000.ckpt 03/06 04:17:08 PM Delete ckpt: model_ckpt_steps_246000.ckpt

image

is this high loss normal? or how can I lower down the total_loss? Thanks

RayDean avatar Mar 06 '24 08:03 RayDean

and when 250K steps finished, the final total_loss is inf, lpips_loss is inf, sr_lpips_loss is also inf

| Training end.. Epoch 0 ended. Steps: 250001. {'total_loss': inf, 'mse_loss': 0.0024544131240717033, 'weights_entropy_loss': 0.050688008500976045, 'num_non_facemask': 56165.82106877656, 'ambient_loss': 2.8842663650615378e-08, 'sr_mse_loss': 0.0008115496115366654, 'lambda_ambient': 469.427371226522, 'head_psnr': 27.943281164014728, 'density_grid_info_min_density': -1.0, 'density_grid_info_max_density': 364738707.3452703, 'density_grid_info_mean_density': 1790.830806371328, 'density_grid_info_occupancy_rate': 0.25496578732052366, 'density_grid_info_step_mean_count': 299778.5135135135, 'lpips_loss': inf, 'sr_lpips_loss': inf, 'sr_lip_lpips_loss': 1.1583641622033645}

Is the normal, how can I fix it? Thanks

RayDean avatar Mar 06 '24 08:03 RayDean

same issue

Oyiyi avatar Mar 18 '24 05:03 Oyiyi

when I trained head NERF and training steps reached 250K, the total_loss is too high, nearly 580, and other loss seems normal. partial logs are :

| Validation results@248000: {'total_loss': 582.6377294922, 'mse_loss': 0.0012603372, 'sr_mse_loss': 0.0013412535, 'lpips_loss': 1.0247015435, 'sr_lpips_loss': 1.1453602004, 'sr_lip_lpips_loss': 1.0416638839, 'lambda_ambient': 579.4234008789} 03/06 04:17:08 PM Epoch 00000@248000: saving model to checkpoints/motion2video_nerf/meimei_head/model_ckpt_steps_248000.ckpt 03/06 04:17:08 PM Delete ckpt: model_ckpt_steps_246000.ckpt

image

is this high loss normal? or how can I lower down the total_loss? Thanks

Till what number of steps does the training of head nerf takes place and how much time it takes. can we stop the process and then resume it from the same checkpoints

MohitPanpaliya avatar Apr 22 '24 12:04 MohitPanpaliya