caffe icon indicating copy to clipboard operation
caffe copied to clipboard

why doesn't the value of loss have convergence? the loss value oscillates in the range of 1 to 3

Open MrCuiHao opened this issue 5 years ago • 0 comments

@weiliu89 hi, thanks for your great contribution in object detection, I am trying to learning ssd,and meeting some questions I can't understand!

Firstly, I want to check if the workers are wearing yellow hard hats, blue overalls, or talking on the phone。

so I prepared my own datasets with 4 classes(1.The man wearing blue overalls. I used the annotation tool to mark the whole person or the upper body including yellow helmet ; 2. The man not wearing blue overalls. marking the whole person or the upper body including yellow helmet 3.Yellow helmet. mark the hat and the head together;4. calling, Marking the upper body where the person is calling with hands putting up),I guess the feature can be distinguished in the 4 classes, how do you feel about this scheme?

I have one gpu,its infomation is as follows: ZOTAC geforce gtx 1080 ti mini with 11GB memory, the picture as follows: 选区_010

Everything is ok until training, the parameters is changed which can suit the environment. the parameters are as follows:

Specify the batch sampler.

resize_width = 512 resize_height = 512

min_dim = 512

batch_size = 8 accum_batch_size = 8 #iter_size = accum_batch_size / batch_size iter_size = 2

solver_param = { # Train parameters 'base_lr': 0.0005, 'weight_decay': 0.0005, 'lr_policy': "multistep", 'stepvalue': [160000, 200000, 240000], 'gamma': 0.1, 'momentum': 0.9, 'iter_size': iter_size, 'max_iter': 240000, 'snapshot': 40000, 'display': 10, 'average_loss': 10, 'type': "SGD", 'solver_mode': solver_mode, 'device_id': device_id, 'debug_info': False, 'snapshot_after_train': True, # Test parameters 'test_iter': [test_iter], 'test_interval': 200, 'eval_type': "detection", 'ap_version': "11point", 'test_initialization': False, }

the training result is as follows, as you can see, the value of loss doesn't have convergence,and the loss value oscillates in the range of 1 to 3, so how to improve this phenomenon?I try to adjust parameters base_lr, and its max value is 0.005,the picture shown here shows a using base_lr of 0.0005, I reduced the value by ten times. the problem still can't be solved. and when i test the snapshot model, the confidence of some detected object are low about more than 0.1, some other are high about more than 0.9, can you give me some advices? thanks, hope your reply. 选区_009

MrCuiHao avatar May 09 '19 05:05 MrCuiHao