DeepSolo
DeepSolo copied to clipboard
question about evaluation
Hi,
I use the provided config configs/R_50/pretrain/150k_tt.yaml
to pretrain on syntext150k and totaltext. During pretraining, there is an evaluation every 10000 iters and it gives an evaluation result A. But when I finish pretraining, I use the saved checkpoint for evaluation with your provided evaluation command but get a different evaluation result B.
Why there is a difference between A and B? Which result is valid?
I notice this before. Probably because of multi-gpu testing. But there is few difference. Use the result from a single gpu and batchsize 1 testing as the final performance.
BTW,your result A is from 4gpus or 8gpus during pretraining?
My result A is from 4 v100 gpus.
And the difference seems quite large. I got 76.88 and 67.91 for detection and e2e in result A. But for result B, I got 81.70 and 71.97 respectively. The checkpoint I used is from 325000 iters.
Are you sure that you use the checkpoint from 325K iters?You said the model was evaluated every 10K. The checkpoint at 320K or 330K should be used for comparison.
Sorry for my wrong description, I changed 'TEST.EVAL_PERIOD' from 10000 to 5000. So the evaluation is performed every 5000 iters.
That’ strange. I mostly observe no difference. When there is a difference, it is no more than 0.5%.
This is the training log around 325k iters:
[05/28 20:12:58] d2.utils.events INFO: eta: 4:47:03 iter: 324939 total_loss: 3.596 loss_ce: 0.02461 loss_texts: 0.1355 loss_ctrl_points: 0.1573 loss_bd_points: 0.1974 loss_ce_0: 0.05473 loss_texts_0: 0.2118 loss_ctrl_points_0: 0.1602 loss_bd_points_0: 0.205 loss_ce_1: 0.03356 loss_texts_1: 0.176 loss_ctrl_points_1: 0.159 loss_bd_points_1: 0.2034 loss_ce_2: 0.02931 loss_texts_2: 0.1587 loss_ctrl_points_2: 0.1562 loss_bd_points_2: 0.1994 loss_ce_3: 0.02547 loss_texts_3: 0.1516 loss_ctrl_points_3: 0.1554 loss_bd_points_3: 0.1974 loss_ce_4: 0.02463 loss_texts_4: 0.1402 loss_ctrl_points_4: 0.1567 loss_bd_points_4: 0.1975 loss_ce_enc: 0.03944 loss_bezier_enc: 0.1588 time: 0.7015 data_time: 0.0197 lr: 1e-05 max_mem: 17172M
[05/28 20:13:12] d2.utils.events INFO: eta: 4:47:09 iter: 324959 total_loss: 3.539 loss_ce: 0.02389 loss_texts: 0.1312 loss_ctrl_points: 0.1642 loss_bd_points: 0.1995 loss_ce_0: 0.06157 loss_texts_0: 0.2065 loss_ctrl_points_0: 0.162 loss_bd_points_0: 0.211 loss_ce_1: 0.03943 loss_texts_1: 0.1676 loss_ctrl_points_1: 0.158 loss_bd_points_1: 0.2078 loss_ce_2: 0.0322 loss_texts_2: 0.1474 loss_ctrl_points_2: 0.1645 loss_bd_points_2: 0.201 loss_ce_3: 0.02726 loss_texts_3: 0.1419 loss_ctrl_points_3: 0.1655 loss_bd_points_3: 0.2002 loss_ce_4: 0.02459 loss_texts_4: 0.1351 loss_ctrl_points_4: 0.1664 loss_bd_points_4: 0.2005 loss_ce_enc: 0.04377 loss_bezier_enc: 0.1702 time: 0.7015 data_time: 0.0250 lr: 1e-05 max_mem: 17172M
[05/28 20:13:26] d2.utils.events INFO: eta: 4:46:45 iter: 324979 total_loss: 3.73 loss_ce: 0.01943 loss_texts: 0.1244 loss_ctrl_points: 0.1631 loss_bd_points: 0.212 loss_ce_0: 0.05605 loss_texts_0: 0.1937 loss_ctrl_points_0: 0.1676 loss_bd_points_0: 0.2171 loss_ce_1: 0.03501 loss_texts_1: 0.1661 loss_ctrl_points_1: 0.1619 loss_bd_points_1: 0.2171 loss_ce_2: 0.02312 loss_texts_2: 0.1362 loss_ctrl_points_2: 0.1638 loss_bd_points_2: 0.2162 loss_ce_3: 0.01981 loss_texts_3: 0.1215 loss_ctrl_points_3: 0.1634 loss_bd_points_3: 0.2144 loss_ce_4: 0.0192 loss_texts_4: 0.1228 loss_ctrl_points_4: 0.1626 loss_bd_points_4: 0.211 loss_ce_enc: 0.04247 loss_bezier_enc: 0.1543 time: 0.7015 data_time: 0.0272 lr: 1e-05 max_mem: 17172M
[05/28 20:13:40] fvcore.common.checkpoint INFO: Saving checkpoint to output/R50/150k_tt/pretrain/model_0324999.pth
[05/28 20:13:41] d2.data.dataset_mapper INFO: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(1000, 1000), max_size=1892, sample_style='choice')]
[05/28 20:13:41] adet.data.dataset_mapper INFO: Rebuilding the augmentations. The previous augmentations will be overridden.
[05/28 20:13:41] adet.data.datasets.text INFO: Loaded 300 images in COCO format from /dataset/totaltext_bezier/totaltext/test.json
[05/28 20:13:42] d2.data.common INFO: Serializing 300 elements to byte tensors and concatenating them all ...
[05/28 20:13:42] d2.data.common INFO: Serialized dataset takes 2.86 MiB
[05/28 20:13:42] d2.evaluation.evaluator INFO: Start inference on 75 batches
[05/28 20:14:20] d2.evaluation.evaluator INFO: Inference done 11/75. Dataloading: 0.0022 s/iter. Inference: 0.2113 s/iter. Eval: 0.0008 s/iter. Total: 0.2143 s/iter. ETA=0:00:13
[05/28 20:14:25] d2.evaluation.evaluator INFO: Inference done 41/75. Dataloading: 0.0041 s/iter. Inference: 0.1730 s/iter. Eval: 0.0010 s/iter. Total: 0.1782 s/iter. ETA=0:00:06
[05/28 20:14:30] d2.evaluation.evaluator INFO: Inference done 70/75. Dataloading: 0.0043 s/iter. Inference: 0.1714 s/iter. Eval: 0.0011 s/iter. Total: 0.1769 s/iter. ETA=0:00:00
[05/28 20:14:31] d2.evaluation.evaluator INFO: Total inference time: 0:00:12.789866 (0.182712 s / iter per device, on 4 devices)
[05/28 20:14:31] d2.evaluation.evaluator INFO: Total inference pure compute time: 0:00:11 (0.171290 s / iter per device, on 4 devices)
[05/28 20:14:32] adet.evaluation.text_evaluation_all INFO: Saving results to output/R50/150k_tt/pretrain/inference/text_results.json
[05/28 20:14:49] d2.engine.defaults INFO: Evaluation results for totaltext_test in csv format:
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: Task: DETECTION_ONLY_RESULTS
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: precision,recall,hmean
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: 0.9353,0.6527,0.7688
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: Task: None-E2E_RESULTS
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: precision,recall,hmean
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: 0.7967,0.5917,0.6791
And this is the evaluation log:
[06/01 11:47:57] fvcore.common.checkpoint INFO: [Checkpointer] Loading from /code/DeepSolo_new/output/R50/150k_tt/pretrain/model_0324999.pth ...
[06/01 11:48:04] d2.data.dataset_mapper INFO: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(1000, 1000), max_size=1892, sample_style='choice')]
[06/01 11:48:04] adet.data.dataset_mapper INFO: Rebuilding the augmentations. The previous augmentations will be overridden.
[06/01 11:48:04] adet.data.datasets.text INFO: Loaded 300 images in COCO format from /dataset/totaltext_bezier/totaltext/test.json
[06/01 11:48:04] d2.data.common INFO: Serializing 300 elements to byte tensors and concatenating them all ...
[06/01 11:48:04] d2.data.common INFO: Serialized dataset takes 2.86 MiB
[06/01 11:48:04] d2.evaluation.evaluator INFO: Start inference on 300 batches
[06/01 11:48:09] d2.evaluation.evaluator INFO: Inference done 11/300. Dataloading: 0.0018 s/iter. Inference: 0.1322 s/iter. Eval: 0.0151 s/iter. Total: 0.1491 s/iter. ETA=0:00:43
[06/01 11:48:14] d2.evaluation.evaluator INFO: Inference done 52/300. Dataloading: 0.0019 s/iter. Inference: 0.1208 s/iter. Eval: 0.0028 s/iter. Total: 0.1255 s/iter. ETA=0:00:31
[06/01 11:48:19] d2.evaluation.evaluator INFO: Inference done 94/300. Dataloading: 0.0019 s/iter. Inference: 0.1195 s/iter. Eval: 0.0020 s/iter. Total: 0.1235 s/iter. ETA=0:00:25
[06/01 11:48:24] d2.evaluation.evaluator INFO: Inference done 135/300. Dataloading: 0.0020 s/iter. Inference: 0.1193 s/iter. Eval: 0.0018 s/iter. Total: 0.1231 s/iter. ETA=0:00:20
[06/01 11:48:29] d2.evaluation.evaluator INFO: Inference done 178/300. Dataloading: 0.0020 s/iter. Inference: 0.1183 s/iter. Eval: 0.0016 s/iter. Total: 0.1221 s/iter. ETA=0:00:14
[06/01 11:48:34] d2.evaluation.evaluator INFO: Inference done 220/300. Dataloading: 0.0020 s/iter. Inference: 0.1184 s/iter. Eval: 0.0015 s/iter. Total: 0.1221 s/iter. ETA=0:00:09
[06/01 11:48:39] d2.evaluation.evaluator INFO: Inference done 261/300. Dataloading: 0.0020 s/iter. Inference: 0.1188 s/iter. Eval: 0.0015 s/iter. Total: 0.1225 s/iter. ETA=0:00:04
[06/01 11:48:45] d2.evaluation.evaluator INFO: Total inference time: 0:00:37.121880 (0.125837 s / iter per device, on 1 devices)
[06/01 11:48:45] d2.evaluation.evaluator INFO: Total inference pure compute time: 0:00:35 (0.118954 s / iter per device, on 1 devices)
[06/01 11:48:45] adet.evaluation.text_evaluation_all INFO: Saving results to output/R50/150k_tt/pretrain/inference/text_results.json
[06/01 11:49:04] d2.engine.defaults INFO: Evaluation results for totaltext_test in csv format:
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: Task: DETECTION_ONLY_RESULTS
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: precision,recall,hmean
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: 0.9135,0.7389,0.8170
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: Task: None-E2E_RESULTS
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: precision,recall,hmean
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: 0.7840,0.6652,0.7197
What about your evaluation performance around 325k iters for 150_tt.yaml
?
Only a final model at 350K provided in the pretrained model list.
ok thanks, I will test the provided pretrained model to check what's wrong. If there is any progress, I will give a feedback on this issue.