DeepSolo icon indicating copy to clipboard operation
DeepSolo copied to clipboard

question about evaluation

Open HumanZhong opened this issue 1 year ago • 9 comments

Hi,

I use the provided config configs/R_50/pretrain/150k_tt.yaml to pretrain on syntext150k and totaltext. During pretraining, there is an evaluation every 10000 iters and it gives an evaluation result A. But when I finish pretraining, I use the saved checkpoint for evaluation with your provided evaluation command but get a different evaluation result B.

Why there is a difference between A and B? Which result is valid?

HumanZhong avatar Jun 01 '23 04:06 HumanZhong

I notice this before. Probably because of multi-gpu testing. But there is few difference. Use the result from a single gpu and batchsize 1 testing as the final performance.

ymy-k avatar Jun 01 '23 04:06 ymy-k

BTW,your result A is from 4gpus or 8gpus during pretraining?

ymy-k avatar Jun 01 '23 04:06 ymy-k

My result A is from 4 v100 gpus.

And the difference seems quite large. I got 76.88 and 67.91 for detection and e2e in result A. But for result B, I got 81.70 and 71.97 respectively. The checkpoint I used is from 325000 iters.

HumanZhong avatar Jun 01 '23 05:06 HumanZhong

Are you sure that you use the checkpoint from 325K iters?You said the model was evaluated every 10K. The checkpoint at 320K or 330K should be used for comparison.

ymy-k avatar Jun 01 '23 06:06 ymy-k

Sorry for my wrong description, I changed 'TEST.EVAL_PERIOD' from 10000 to 5000. So the evaluation is performed every 5000 iters.

HumanZhong avatar Jun 01 '23 06:06 HumanZhong

That’ strange. I mostly observe no difference. When there is a difference, it is no more than 0.5%.

ymy-k avatar Jun 01 '23 06:06 ymy-k

This is the training log around 325k iters:

[05/28 20:12:58] d2.utils.events INFO:  eta: 4:47:03  iter: 324939  total_loss: 3.596  loss_ce: 0.02461  loss_texts: 0.1355  loss_ctrl_points: 0.1573  loss_bd_points: 0.1974  loss_ce_0: 0.05473  loss_texts_0: 0.2118  loss_ctrl_points_0: 0.1602  loss_bd_points_0: 0.205  loss_ce_1: 0.03356  loss_texts_1: 0.176  loss_ctrl_points_1: 0.159  loss_bd_points_1: 0.2034  loss_ce_2: 0.02931  loss_texts_2: 0.1587  loss_ctrl_points_2: 0.1562  loss_bd_points_2: 0.1994  loss_ce_3: 0.02547  loss_texts_3: 0.1516  loss_ctrl_points_3: 0.1554  loss_bd_points_3: 0.1974  loss_ce_4: 0.02463  loss_texts_4: 0.1402  loss_ctrl_points_4: 0.1567  loss_bd_points_4: 0.1975  loss_ce_enc: 0.03944  loss_bezier_enc: 0.1588  time: 0.7015  data_time: 0.0197  lr: 1e-05  max_mem: 17172M
[05/28 20:13:12] d2.utils.events INFO:  eta: 4:47:09  iter: 324959  total_loss: 3.539  loss_ce: 0.02389  loss_texts: 0.1312  loss_ctrl_points: 0.1642  loss_bd_points: 0.1995  loss_ce_0: 0.06157  loss_texts_0: 0.2065  loss_ctrl_points_0: 0.162  loss_bd_points_0: 0.211  loss_ce_1: 0.03943  loss_texts_1: 0.1676  loss_ctrl_points_1: 0.158  loss_bd_points_1: 0.2078  loss_ce_2: 0.0322  loss_texts_2: 0.1474  loss_ctrl_points_2: 0.1645  loss_bd_points_2: 0.201  loss_ce_3: 0.02726  loss_texts_3: 0.1419  loss_ctrl_points_3: 0.1655  loss_bd_points_3: 0.2002  loss_ce_4: 0.02459  loss_texts_4: 0.1351  loss_ctrl_points_4: 0.1664  loss_bd_points_4: 0.2005  loss_ce_enc: 0.04377  loss_bezier_enc: 0.1702  time: 0.7015  data_time: 0.0250  lr: 1e-05  max_mem: 17172M
[05/28 20:13:26] d2.utils.events INFO:  eta: 4:46:45  iter: 324979  total_loss: 3.73  loss_ce: 0.01943  loss_texts: 0.1244  loss_ctrl_points: 0.1631  loss_bd_points: 0.212  loss_ce_0: 0.05605  loss_texts_0: 0.1937  loss_ctrl_points_0: 0.1676  loss_bd_points_0: 0.2171  loss_ce_1: 0.03501  loss_texts_1: 0.1661  loss_ctrl_points_1: 0.1619  loss_bd_points_1: 0.2171  loss_ce_2: 0.02312  loss_texts_2: 0.1362  loss_ctrl_points_2: 0.1638  loss_bd_points_2: 0.2162  loss_ce_3: 0.01981  loss_texts_3: 0.1215  loss_ctrl_points_3: 0.1634  loss_bd_points_3: 0.2144  loss_ce_4: 0.0192  loss_texts_4: 0.1228  loss_ctrl_points_4: 0.1626  loss_bd_points_4: 0.211  loss_ce_enc: 0.04247  loss_bezier_enc: 0.1543  time: 0.7015  data_time: 0.0272  lr: 1e-05  max_mem: 17172M
[05/28 20:13:40] fvcore.common.checkpoint INFO: Saving checkpoint to output/R50/150k_tt/pretrain/model_0324999.pth
[05/28 20:13:41] d2.data.dataset_mapper INFO: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(1000, 1000), max_size=1892, sample_style='choice')]
[05/28 20:13:41] adet.data.dataset_mapper INFO: Rebuilding the augmentations. The previous augmentations will be overridden.
[05/28 20:13:41] adet.data.datasets.text INFO: Loaded 300 images in COCO format from /dataset/totaltext_bezier/totaltext/test.json
[05/28 20:13:42] d2.data.common INFO: Serializing 300 elements to byte tensors and concatenating them all ...
[05/28 20:13:42] d2.data.common INFO: Serialized dataset takes 2.86 MiB
[05/28 20:13:42] d2.evaluation.evaluator INFO: Start inference on 75 batches
[05/28 20:14:20] d2.evaluation.evaluator INFO: Inference done 11/75. Dataloading: 0.0022 s/iter. Inference: 0.2113 s/iter. Eval: 0.0008 s/iter. Total: 0.2143 s/iter. ETA=0:00:13
[05/28 20:14:25] d2.evaluation.evaluator INFO: Inference done 41/75. Dataloading: 0.0041 s/iter. Inference: 0.1730 s/iter. Eval: 0.0010 s/iter. Total: 0.1782 s/iter. ETA=0:00:06
[05/28 20:14:30] d2.evaluation.evaluator INFO: Inference done 70/75. Dataloading: 0.0043 s/iter. Inference: 0.1714 s/iter. Eval: 0.0011 s/iter. Total: 0.1769 s/iter. ETA=0:00:00
[05/28 20:14:31] d2.evaluation.evaluator INFO: Total inference time: 0:00:12.789866 (0.182712 s / iter per device, on 4 devices)
[05/28 20:14:31] d2.evaluation.evaluator INFO: Total inference pure compute time: 0:00:11 (0.171290 s / iter per device, on 4 devices)
[05/28 20:14:32] adet.evaluation.text_evaluation_all INFO: Saving results to output/R50/150k_tt/pretrain/inference/text_results.json
[05/28 20:14:49] d2.engine.defaults INFO: Evaluation results for totaltext_test in csv format:
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: Task: DETECTION_ONLY_RESULTS
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: precision,recall,hmean
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: 0.9353,0.6527,0.7688
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: Task: None-E2E_RESULTS
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: precision,recall,hmean
[05/28 20:14:49] d2.evaluation.testing INFO: copypaste: 0.7967,0.5917,0.6791

And this is the evaluation log:

[06/01 11:47:57] fvcore.common.checkpoint INFO: [Checkpointer] Loading from /code/DeepSolo_new/output/R50/150k_tt/pretrain/model_0324999.pth ...
[06/01 11:48:04] d2.data.dataset_mapper INFO: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(1000, 1000), max_size=1892, sample_style='choice')]
[06/01 11:48:04] adet.data.dataset_mapper INFO: Rebuilding the augmentations. The previous augmentations will be overridden.
[06/01 11:48:04] adet.data.datasets.text INFO: Loaded 300 images in COCO format from /dataset/totaltext_bezier/totaltext/test.json
[06/01 11:48:04] d2.data.common INFO: Serializing 300 elements to byte tensors and concatenating them all ...
[06/01 11:48:04] d2.data.common INFO: Serialized dataset takes 2.86 MiB
[06/01 11:48:04] d2.evaluation.evaluator INFO: Start inference on 300 batches
[06/01 11:48:09] d2.evaluation.evaluator INFO: Inference done 11/300. Dataloading: 0.0018 s/iter. Inference: 0.1322 s/iter. Eval: 0.0151 s/iter. Total: 0.1491 s/iter. ETA=0:00:43
[06/01 11:48:14] d2.evaluation.evaluator INFO: Inference done 52/300. Dataloading: 0.0019 s/iter. Inference: 0.1208 s/iter. Eval: 0.0028 s/iter. Total: 0.1255 s/iter. ETA=0:00:31
[06/01 11:48:19] d2.evaluation.evaluator INFO: Inference done 94/300. Dataloading: 0.0019 s/iter. Inference: 0.1195 s/iter. Eval: 0.0020 s/iter. Total: 0.1235 s/iter. ETA=0:00:25
[06/01 11:48:24] d2.evaluation.evaluator INFO: Inference done 135/300. Dataloading: 0.0020 s/iter. Inference: 0.1193 s/iter. Eval: 0.0018 s/iter. Total: 0.1231 s/iter. ETA=0:00:20
[06/01 11:48:29] d2.evaluation.evaluator INFO: Inference done 178/300. Dataloading: 0.0020 s/iter. Inference: 0.1183 s/iter. Eval: 0.0016 s/iter. Total: 0.1221 s/iter. ETA=0:00:14
[06/01 11:48:34] d2.evaluation.evaluator INFO: Inference done 220/300. Dataloading: 0.0020 s/iter. Inference: 0.1184 s/iter. Eval: 0.0015 s/iter. Total: 0.1221 s/iter. ETA=0:00:09
[06/01 11:48:39] d2.evaluation.evaluator INFO: Inference done 261/300. Dataloading: 0.0020 s/iter. Inference: 0.1188 s/iter. Eval: 0.0015 s/iter. Total: 0.1225 s/iter. ETA=0:00:04
[06/01 11:48:45] d2.evaluation.evaluator INFO: Total inference time: 0:00:37.121880 (0.125837 s / iter per device, on 1 devices)
[06/01 11:48:45] d2.evaluation.evaluator INFO: Total inference pure compute time: 0:00:35 (0.118954 s / iter per device, on 1 devices)
[06/01 11:48:45] adet.evaluation.text_evaluation_all INFO: Saving results to output/R50/150k_tt/pretrain/inference/text_results.json
[06/01 11:49:04] d2.engine.defaults INFO: Evaluation results for totaltext_test in csv format:
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: Task: DETECTION_ONLY_RESULTS
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: precision,recall,hmean
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: 0.9135,0.7389,0.8170
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: Task: None-E2E_RESULTS
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: precision,recall,hmean
[06/01 11:49:04] d2.evaluation.testing INFO: copypaste: 0.7840,0.6652,0.7197

What about your evaluation performance around 325k iters for 150_tt.yaml?

HumanZhong avatar Jun 01 '23 06:06 HumanZhong

Only a final model at 350K provided in the pretrained model list.

ymy-k avatar Jun 01 '23 08:06 ymy-k

ok thanks, I will test the provided pretrained model to check what's wrong. If there is any progress, I will give a feedback on this issue.

HumanZhong avatar Jun 01 '23 09:06 HumanZhong