TFD-ICDAR2019 icon indicating copy to clipboard operation
TFD-ICDAR2019 copied to clipboard

Bounding boxes are displaced from math regions

Open VladimirKalachikhin opened this issue 4 years ago • 17 comments

Yes, I rendered the image to sizes from file_sizes file. But bounding boxes are fully displaced. 1

I see that pages numeration on math_gt .csv files start from 0, but convert_pdf_to_image.py created pages from 1. Also, convert_pdf_to_image.py creates images different them in file_sizes sizes.

I make my own convert_pdf_to_image, and rending images correct sizes. I start numeration from 0 or 1. Nothing happened.

I tried http://aif.centre-mersenne.org/article/AIF_1970__20_1_493_0.pdf as AIF_1970_493_498.pdf

VladimirKalachikhin avatar Jun 15 '20 20:06 VladimirKalachikhin

Did you try other pdfs?

MaliParag avatar Jun 18 '20 14:06 MaliParag

Yes, I download these files: http://aif.centre-mersenne.org/article/AIF_1970__20_1_493_0.pdf ,AIF_1970_493_498.pdf http://aif.centre-mersenne.org/article/AIF_1999__49_2_375_0.pdf ,AIF_1999_375_404.pdf http://www.numdam.org/article/ASENS_1970_4_3_3_273_0.pdf ,ASENS_1970_273_284.pdf http://www.numdam.org/article/ASENS_1997_4_30_3_367_0.pdf ,ASENS_1997_367_384.pdf https://www.ncbi.nlm.nih.gov/pmc/articles/PMC323452/pdf/pnas00314-0027.pdf ,Borcherds86.pdf http://www.numdam.org/article/BSMF_1970__98__165_0.pdf ,BSMF_1970_165_192.pdf http://www.numdam.org/article/BSMF_1998__126_2_245_0.pdf ,BSMF_1998_245_271.pdf http://people.virginia.edu/~lls2l/finite_dimensional.pdf ,Cline88.pdf

Other files are unavailable.

Only for Borcherds86.pdf and Cline88.pdf bounding boxes are placed on math regions correctly. For other files bounding boxes are fully displaced.

VladimirKalachikhin avatar Jun 18 '20 15:06 VladimirKalachikhin

Dear sir, I got the same errors too, There are 9 pdf files displaced. They are AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284,
Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271, InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292. Others are match well with the label. The fellow is AIF_1999_375_404.pdf 1.png 1

BigPandaCPU avatar Jul 01 '20 14:07 BigPandaCPU

Which version of pdf2image are you using?

I think I used the following version -

Name: pdf2image Version: 1.5.4

MaliParag avatar Jul 20 '20 19:07 MaliParag

many PDF link are not aviliable. who has a package of all pdf files? can you share a link by GoogleDriver or BaiDu or something else? Thanks.

macqueen09 avatar Aug 10 '20 07:08 macqueen09

The answer to questions: https://github.com/VladimirKalachikhin/marmot-to-ICDAR

VladimirKalachikhin avatar Sep 12 '20 20:09 VladimirKalachikhin

i got the same problem on AIF_1999_375_404.pdf @2.png!! with pdf2image-version==1.5.4@MaliParag 222

2

humeme avatar Nov 12 '20 08:11 humeme

Hi @VladimirKalachikhin , I have the same problem as you. I found that some images do not match their corresponding GT. Have you solved this problem now? Thank you!

Jeozhao avatar Jan 14 '21 05:01 Jeozhao

Hi @MaliParag ,

Could you please share your image dataset with us? I found that different download channels and different versions of the pdf2png conversion tool may cause the image to not match GT. So, it would be very grateful to us if you share your data set with us.

Jeozhao avatar Jan 14 '21 05:01 Jeozhao

Have you solved this problem now?

I used MARMOT dataset, see above.

VladimirKalachikhin avatar Jan 14 '21 08:01 VladimirKalachikhin

Have you solved this problem now?

I used MARMOT dataset, see above.

Hi @VladimirKalachikhin , Can this data be converted to be the same as TDF-ICDAR2019? Or is it just that the format can be kept consistent, but the content is not consistent? Thanks!

Jeozhao avatar Jan 14 '21 08:01 Jeozhao

I don't quite understand you. MARMOT just another one dataset. I created a simple tool to convert MARMOT to IDCAR-compatible format for use IDCAR instruments.

VladimirKalachikhin avatar Jan 14 '21 08:01 VladimirKalachikhin

I don't quite understand you. MARMOT just another one dataset. I created a simple tool to convert MARMOT to IDCAR-compatible format for use IDCAR instruments.

Thank you for your reply. I have understand your mean.

Jeozhao avatar Jan 14 '21 08:01 Jeozhao

Dear sir, I got the same errors too, There are 9 pdf files displaced. They are AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284, Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271, InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292. Others are match well with the label. The fellow is AIF_1999_375_404.pdf 1.png 1

could you share me all image datasets that you created, thank you very much !

ducMNSD avatar Feb 20 '21 15:02 ducMNSD

Dear sir, I got the same errors too, There are 9 pdf files displaced. They are AIF_1970_493_498, AIF_1999_375_404, ASENS_1970_273_284, Bergweiler83, BSMF_1970_165_192, BSMF_1998_245_271, InvM_1970_121_134, MA_1970_26_38, MA_1977_275_292. Others are match well with the label. The fellow is AIF_1999_375_404.pdf 1.png 1

could you share me all image datasets that you created, thank you very much !

I get the data from this. https://github.com/MaliParag/TFD-ICDAR2019#download-instructions QQ截图20210224093816

The download link file. 22

BigPandaCPU avatar Feb 24 '21 01:02 BigPandaCPU

NOTE: If you find the bounding boxes are displaced from math regions, it is because the document image that you have rendered is of different size than the one used while annotating. datasetV2 provides file sizes for each image. Resize the image that you have rendered to the size provided in datasetV2 and you should be able to use the annotations.

MingchangLi avatar Jul 30 '21 22:07 MingchangLi

datasetV2 provides file sizes for each image.

I know.

VladimirKalachikhin avatar Jul 31 '21 07:07 VladimirKalachikhin