OpenUnReID
OpenUnReID copied to clipboard
Loss becomes nan when I try to train MMT for 100 epochs
Hello: I downloaded your code and Market1501-UDA-MMT config.yaml in your ModelZoo. At the beginning, I trained the model with configuration the same as downloaded, and I get the correct results on Market1501 after training for 50 epochs, which is mAP 81.0% / R-1 92.3%. However, after adjusting the total epoch from 50 to 100, all losses become nan on the epoch 64.
Here's the log: ************************* Finished updating pseudo label *************************n Epoch: [64][ 0/400] Time 0.618 (0.618) Acc@1 46.88% (46.88%) cross_entropy 5.810 (5.810) soft_entropy 5.974 (5.974) softmax_triplet 0.095 (0.095) soft_softmax_triplet 0.113 (0.113) Epoch: [64][ 10/400] Time 0.412 (0.432) Acc@1 70.31% (52.41%) cross_entropy 4.323 (5.268) soft_entropy 5.372 (6.020) softmax_triplet 0.236 (0.333) soft_softmax_triplet 0.338 (0.314) Epoch: [64][ 20/400] Time 0.410 (0.422) Acc@1 51.56% (53.65%) cross_entropy 5.469 (5.231) soft_entropy 6.135 (5.920) softmax_triplet 0.829 (0.415) soft_softmax_triplet 0.786 (0.417) Epoch: [64][ 30/400] Time 0.411 (0.418) Acc@1 59.38% (53.43%) cross_entropy 5.301 (5.288) soft_entropy 6.075 (5.977) softmax_triplet 0.159 (0.350) soft_softmax_triplet 0.162 (0.367) Epoch: [64][ 40/400] Time 0.403 (0.416) Acc@1 48.44% (53.58%) cross_entropy 5.748 (5.311) soft_entropy 6.841 (6.009) softmax_triplet 0.650 (0.363) soft_softmax_triplet 0.841 (0.390) Epoch: [64][ 50/400] Time 0.411 (0.421) Acc@1 90.62% (56.43%) cross_entropy 4.243 (5.240) soft_entropy 6.419 (6.050) softmax_triplet 0.711 (0.392) soft_softmax_triplet 0.642 (0.412) Epoch: [64][ 60/400] Time 0.411 (0.420) Acc@1 84.38% (60.40%) cross_entropy 4.272 (5.112) soft_entropy 5.915 (6.013) softmax_triplet 0.178 (0.400) soft_softmax_triplet 0.181 (0.413) Epoch: [64][ 70/400] Time 0.411 (0.419) Acc@1 82.81% (63.34%) cross_entropy 4.216 (5.005) soft_entropy 5.374 (5.953) softmax_triplet 0.143 (0.400) soft_softmax_triplet 0.147 (0.416) Epoch: [64][ 80/400] Time 0.411 (0.418) Acc@1 68.75% (65.12%) cross_entropy 4.149 (4.915) soft_entropy 4.728 (5.885) softmax_triplet 0.135 (0.395) soft_softmax_triplet 0.136 (0.408) Epoch: [64][ 90/400] Time 0.410 (0.420) Acc@1 79.69% (66.86%) cross_entropy 4.661 (4.848) soft_entropy 6.480 (5.843) softmax_triplet 0.494 (0.403) soft_softmax_triplet 0.562 (0.410) Epoch: [64][100/400] Time 0.409 (0.420) Acc@1 89.06% (68.83%) cross_entropy 3.709 (4.748) soft_entropy 5.128 (5.814) softmax_triplet 0.023 (0.396) soft_softmax_triplet 0.024 (0.410) Epoch: [64][110/400] Time 0.409 (0.419) Acc@1 87.50% (70.33%) cross_entropy 3.880 (4.690) soft_entropy 5.349 (5.803) softmax_triplet 0.034 (0.400) soft_softmax_triplet 0.036 (0.420) Epoch: [64][120/400] Time 0.412 (0.418) Acc@1 89.06% (71.73%) cross_entropy 4.067 (4.626) soft_entropy 6.327 (5.783) softmax_triplet 0.033 (0.378) soft_softmax_triplet 0.067 (0.399) Epoch: [64][130/400] Time 0.403 (0.417) Acc@1 81.25% (72.52%) cross_entropy 4.177 (4.597) soft_entropy 6.546 (5.798) softmax_triplet 0.301 (0.375) soft_softmax_triplet 0.345 (0.396) Epoch: [64][140/400] Time 0.410 (0.419) Acc@1 98.44% (73.85%) cross_entropy 3.169 (4.514) soft_entropy 4.617 (5.756) softmax_triplet 0.005 (0.365) soft_softmax_triplet 0.006 (0.388) Epoch: [64][150/400] Time 0.411 (0.419) Acc@1 96.88% (75.03%) cross_entropy 3.230 (4.454) soft_entropy 5.414 (5.738) softmax_triplet 0.034 (0.358) soft_softmax_triplet 0.042 (0.380) Epoch: [64][160/400] Time 0.411 (0.418) Acc@1 79.69% (75.90%) cross_entropy 3.972 (4.407) soft_entropy 5.432 (5.724) softmax_triplet 0.157 (0.362) soft_softmax_triplet 0.159 (0.386) Epoch: [64][170/400] Time 0.411 (0.418) Acc@1 82.81% (76.35%) cross_entropy 3.553 (4.380) soft_entropy 4.720 (5.720) softmax_triplet 0.021 (0.358) soft_softmax_triplet 0.022 (0.381) Epoch: [64][180/400] Time 0.411 (0.419) Acc@1 89.06% (77.10%) cross_entropy 3.552 (4.347) soft_entropy 5.564 (5.717) softmax_triplet 0.395 (0.358) soft_softmax_triplet 0.557 (0.379) Epoch: [64][190/400] Time 0.412 (0.419) Acc@1 90.62% (77.83%) cross_entropy 3.651 (4.306) soft_entropy 5.183 (5.714) softmax_triplet 0.300 (0.359) soft_softmax_triplet 0.304 (0.381) Epoch: [64][200/400] Time 0.412 (0.418) Acc@1 90.62% (78.39%) cross_entropy 3.595 (4.274) soft_entropy 4.808 (5.704) softmax_triplet 0.449 (0.358) soft_softmax_triplet 0.450 (0.381) Epoch: [64][210/400] Time 0.413 (0.418) Acc@1 92.19% (78.92%) cross_entropy 3.538 (4.244) soft_entropy 5.474 (5.696) softmax_triplet 0.094 (0.354) soft_softmax_triplet 0.184 (0.376) Epoch: [64][220/400] Time 0.750 (0.419) Acc@1 87.50% (79.31%) cross_entropy 3.932 (4.221) soft_entropy 6.867 (5.692) softmax_triplet 1.262 (0.357) soft_softmax_triplet 1.503 (0.380) Epoch: [64][230/400] Time 0.412 (0.419) Acc@1 93.75% (79.82%) cross_entropy 3.531 (4.186) soft_entropy 5.543 (5.678) softmax_triplet 0.045 (0.351) soft_softmax_triplet 0.052 (0.374) Epoch: [64][240/400] Time 0.412 (0.419) Acc@1 95.31% (80.30%) cross_entropy 3.225 (4.156) soft_entropy 5.377 (5.669) softmax_triplet 0.010 (0.344) soft_softmax_triplet 0.106 (0.369) Epoch: [64][250/400] Time 0.451 (0.419) Acc@1 92.19% (80.66%) cross_entropy 3.677 (4.138) soft_entropy 6.138 (5.661) softmax_triplet 0.128 (0.346) soft_softmax_triplet 0.222 (0.369) Epoch: [64][260/400] Time 0.441 (0.421) Acc@1 87.50% (80.92%) cross_entropy 3.759 (4.119) soft_entropy 5.217 (5.653) softmax_triplet 0.451 (0.345) soft_softmax_triplet 0.512 (0.367) Epoch: [64][270/400] Time 0.453 (0.423) Acc@1 85.94% (81.24%) cross_entropy 3.600 (4.103) soft_entropy 5.220 (5.654) softmax_triplet 0.057 (0.350) soft_softmax_triplet 0.062 (0.372) Epoch: [64][280/400] Time 0.451 (0.424) Acc@1 100.00% (81.68%) cross_entropy 2.953 (4.078) soft_entropy 5.246 (5.647) softmax_triplet 0.054 (0.342) soft_softmax_triplet 0.060 (0.365) Epoch: [64][290/400] Time 0.452 (0.425) Acc@1 89.06% (81.97%) cross_entropy 3.536 (4.061) soft_entropy 4.614 (5.650) softmax_triplet 0.003 (0.345) soft_softmax_triplet 0.005 (0.368) Epoch: [64][300/400] Time 0.449 (0.426) Acc@1 93.75% (82.25%) cross_entropy 3.347 (4.047) soft_entropy 5.860 (5.653) softmax_triplet 0.228 (0.353) soft_softmax_triplet 0.345 (0.373) Epoch: [64][310/400] Time 0.413 (0.427) Acc@1 89.06% (82.50%) cross_entropy 3.414 (4.033) soft_entropy 5.487 (5.649) softmax_triplet 0.049 (0.352) soft_softmax_triplet 0.055 (0.374) Epoch: [64][320/400] Time 0.411 (0.427) Acc@1 98.44% (82.80%) cross_entropy 2.885 (4.014) soft_entropy 4.540 (5.649) softmax_triplet 0.002 (0.353) soft_softmax_triplet 0.003 (0.373) Epoch: [64][330/400] Time 0.413 (0.426) Acc@1 96.88% (82.99%) cross_entropy 3.345 (4.004) soft_entropy 5.611 (5.654) softmax_triplet 0.732 (0.361) soft_softmax_triplet 0.729 (0.379) Epoch: [64][340/400] Time 0.410 (0.426) Acc@1 89.06% (83.21%) cross_entropy 3.622 (3.990) soft_entropy 5.257 (5.652) softmax_triplet 0.116 (0.360) soft_softmax_triplet 0.207 (0.379) Epoch: [64][350/400] Time 0.406 (0.425) Acc@1 93.75% (83.40%) cross_entropy 3.257 (3.978) soft_entropy 5.324 (5.645) softmax_triplet 0.071 (0.359) soft_softmax_triplet 0.072 (0.376) Epoch: [64][360/400] Time 0.188 (0.422) Acc@1 45.31% (82.93%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan) Epoch: [64][370/400] Time 0.182 (0.416) Acc@1 46.88% (81.94%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan) Epoch: [64][380/400] Time 0.183 (0.410) Acc@1 39.06% (80.98%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan) Epoch: [64][390/400] Time 0.181 (0.404) Acc@1 45.31% (80.07%) cross_entropy nan (nan) soft_entropy nan (nan) softmax_triplet nan (nan) soft_softmax_triplet nan (nan) ==> Val on the no.0 model
************************* Start validating market1501 on epoch 64 *************************n Val: [ 0/18] Time 0.112 (0.112) Data 0.071 (0.071) Val: [10/18] Time 0.030 (0.038) Data 0.000 (0.006)
Mean AP: 2.0% CMC Scores: top-1 0.4% top-5 1.4% top-10 1.4% Validating time: 0:00:00.967822
************************* Finished validating *************************
==> Val on the no.1 model
************************* Start validating market1501 on epoch 64 *************************n Val: [ 0/18] Time 0.109 (0.109) Data 0.068 (0.068) Val: [10/18] Time 0.031 (0.038) Data 0.000 (0.006)
Mean AP: 96.5% CMC Scores: top-1 98.1% top-5 99.5% top-10 99.9% Validating time: 0:00:01.165367
************************* Finished validating *************************
- Finished epoch 64 mAP: 96.5% best: 96.5% *