detection-attributes-fields icon indicating copy to clipboard operation
detection-attributes-fields copied to clipboard

loss become negative after 100 epochs

Open lucasjinreal opened this issue 4 years ago • 6 comments

INFO:openpifpaf.network.trainer:{'type': 'train', 'epoch': 65, 'batch': 840, 'n_batches': 1266, 'time': 0.948, 'data_time': 0.0, 'lr': 0.0005, 'loss': -31.85, 'head_losses': [0.003, 0.097, 0.076, 0.111, 0.02, 0.25, 0.016, 0.027, 0.001, 0.0, 0.014, 0.015, 0.026, 0.019, 0.073, 0.021, 0.013, 0.004, 0.0, 0.039, 0.09, 0.088, 0.037, 0.006, 0.0, 0.003, 0.014, 0.003, 0.012, 0.011, 0.002, 0.002, 0.001, 0.002, 0.012], 'mtl_sigmas': [0.002, 0.301, 0.275, 0.367, 0.081, 0.518, 0.089, 0.171, 0.016, 0.0, 0.1, 0.092, 0.124, 0.122, 0.218, 0.046, 0.09, 0.027, 0.0, 0.176, 0.262, 0.26, 0.157, 0.037, 0.0, 0.003, 0.09, 0.003, 0.087, 0.065, 0.006, 0.01, 0.0, 0.002, 0.072]}

is this normal?

lucasjinreal avatar Jun 08 '21 08:06 lucasjinreal

Yes, negative loss values can happen when using the --auto-tune-mtl option, this is not an issue. This is due to the formulation used for the loss function. Note that if you train on JAAD with initialization from the OpenPifPaf checkpoint, you should need much fewer epochs to converge.

taylormordan avatar Jun 08 '21 10:06 taylormordan

@taylormordan I set a 100 epochs, but loss came to -31, what's the normal value of loss, is it become close to 0?

lucasjinreal avatar Jun 08 '21 11:06 lucasjinreal

If you train for 5-10 epochs, the loss should be around 0 on average over an epoch, but loss values for individual batches may get higher or lower easily. I also observe this behavior.

taylormordan avatar Jun 08 '21 20:06 taylormordan

@taylormordan Will the result become worse if train more epochs?

lucasjinreal avatar Jun 09 '21 03:06 lucasjinreal

It will start to overfit at one point. The optimal number of epochs might depend on your hyper-parameter choice though (learning rate, batch size...).

taylormordan avatar Jun 14 '21 10:06 taylormordan

hi can you plz help me with this issue?

python3 -m openpifpaf.train: error: unrecognized arguments: --datasets jaad --jaad-root-dir /content/drive/MyDrive/jaad/JAAD_clips/ --jaad-subset default --jaad-training-set train --jaad-validation-set val --pifpaf-pretraining --detection-bias-prior 0.01 --jaad-head-upsample 2 --jaad-pedestrian-attributes all --fork-normalization-operation power --fork-normalization-duplicates 35 --attribute-regression-loss l1 --attribute-focal-gamma 2

This is what I get after running the script. How have u donwloaded the dataset and what have you done with the annotations?

Affanabbbas avatar Jun 09 '22 06:06 Affanabbbas