VLMEvalKit socre 0 for GPT4o_20241120 model on CCOCR_DocParsing

I test gpt4o_20241120 on CCOCR_DocParsing_TablePhotoEng. However, I find the score seems to be 0. Here is an example: gt:

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n

\n code\n	\n left side label\n	\n right side label\n	\n dump die number\n
\n lla\n	\n left\n	\n left\n	\n 11\n
\n llb\n	\n left\n	\n left\n	\n 2\n
\n lra\n	\n left\n	\n right\n	\n 1\n
\n lrb\n	\n left\n	\n right\n	\n 2\n
\n rla\n	\n right\n	\n left\n	\n 1\n
\n rlb\n	\n right\n	\n left\n	\n 2\n
\n rra\n	\n right\n	\n right\n	\n 1\n
\n rrb\n	\n right\n	\n right\n	\n 2\n
\n a\n	\n no label\n	\n no label\n	\n 1\n
\n b\n	\n no label\n	\n no label\n	\n 2\n

\n\n

prediction: Below is the HTML representation of the table depicted in the image using <tr> and <td> tags:\n\nhtml\n<table border=\"1\">\n <tr>\n <td>code</td>\n <td>left side label</td>\n <td>right side label</td>\n <td>dump die number</td>\n </tr>\n <tr>\n <td>lla</td>\n <td>left</td>\n <td>left</td>\n <td>1</td>\n </tr>\n <tr>\n <td>llb</td>\n <td>left</td>\n <td>left</td>\n <td>2</td>\n </tr>\n <tr>\n <td>lra</td>\n <td>left</td>\n <td>right</td>\n <td>1</td>\n </tr>\n <tr>\n <td>lrb</td>\n <td>left</td>\n <td>right</td>\n <td>2</td>\n </tr>\n <tr>\n <td>rla</td>\n <td>right</td>\n <td>left</td>\n <td>1</td>\n </tr>\n <tr>\n <td>rlb</td>\n <td>right</td>\n <td>left</td>\n <td>2</td>\n </tr>\n <tr>\n <td>rra</td>\n <td>right</td>\n <td>right</td>\n <td>1</td>\n </tr>\n <tr>\n <td>rrb</td>\n <td>right</td>\n <td>right</td>\n <td>2</td>\n </tr>\n <tr>\n <td>a</td>\n <td>no label</td>\n <td>no label</td>\n <td>1</td>\n </tr>\n <tr>\n <td>b</td>\n <td>no label</td>\n <td>no label</td>\n <td>2</td>\n </tr>\n</table>\n\n\nThis HTML faithfully transcribes the table from the synthetic image provided. Each <tr> (table row) contains <td> (table data cells) entries corresponding to the data from the image.

score: 0

It seems that the < table boarder > in prediction makes the score 0. The problem is afterpred = html.fromstring(pred, parser=parser), pred.xpath("body/table") is an empty list, which leads to score 0.

Could someone please help me to fix it?

What's more, could you please privide preformances of different models on CCOCR. I've check your huggingface spaces but only find OCRBench. Thanks

Mar 13 '25 11:03 nutsintheshell

Hi @nutsintheshell. I'd like to know your execution command, and I'll try to reproduce it. By the way, we do not provide all dataset results in VLMEvalKit, you can refer to https://huggingface.co/datasets/wulipc/CC-OCR for performances of different models in CC-OCR.

Mar 17 '25 07:03 FangXinyu-0913

Hi, @nutsintheshell ,

I have conducted the evaluation but also observed a score close to 0 (0.001, actually).

@wulipc , would you please help check whether the result is expected? The evaluation records are pasted below.

GPT4o_20241120_CCOCR_DocParsing_TablePhotoEng_eval.json GPT4o_20241120_CCOCR_DocParsing_TablePhotoEng.xlsx

Apr 08 '25 13:04 kennymckormick

Hi, @nutsintheshell ,

I have conducted the evaluation but also observed a score close to 0 (0.001, actually).

@wulipc , would you please help check whether the result is expected? The evaluation records are pasted below.

GPT4o_20241120_CCOCR_DocParsing_TablePhotoEng_eval.json GPT4o_20241120_CCOCR_DocParsing_TablePhotoEng.xlsx

@kennymckormick Thank you for your interest in CC-OCR. We have updated the table extraction code in the doc_parsing evaluation code to adapt to more scenarios. For details, see [here](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/commit/851672c8005bef8b7cb38c79c51a4fbf9a66d970). You can copy this file to vlmeval/dataset/utils/ccocr_evaluator/doc_parsing_evaluator.py for use. We will submit a PR to vlmeval with this update in the future, so stay tuned!

Apr 09 '25 08:04 wulipc

socre 0 for GPT4o_20241120 model on CCOCR_DocParsing_TablePhotoEng