unilm TextDiffusers: question about Mario-LAION annotations

Hi,

I found some wrong annotations with misplaced bounding boxes in Mario-LAION after I downloaded the images in the url.txt.

Here are several examples:

Are these the examples with incorrect annotations, as mentioned in your paper, or do my downloaded images differ from the ones you used for the annotations? I want to check if my data processing is correct. And could you share more about the filtering? Are you using another ocr model to check if each bounding box contains the annotated text?

Thanks.

Jul 05 '23 19:07 Question406

Thanks for your attention to our work! Could you send us the index (xxxxx_xxxxxxxxx) of samples that contain incorrect annotations? I will check those samples then. The filtering rules are illustrated in Appendix E and no more ocr models are used.

Jul 05 '23 22:07 JingyeChen

Thanks for your response.

The examples above are randomly sampled; I forgot to keep the index :( But could you check on these examples?

00000_000003058 00000_000004343

The images are downloaded with this command img2dataset --url_list=mario-laion-url.txt --output_folder=laion_ocr --thread_count=64 --image_size=512 following your README.

Jul 05 '23 23:07 Question406

Thanks for your feedback. It is a mistake and the command should be:

img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64  --resize_mode=no

We will fix it in the readme file. Thanks!

Jul 06 '23 02:07 JingyeChen

I see. Thanks for your response. Additionally, could you kindly inform me when will Mario-10M be released?

Jul 06 '23 02:07 Question406

It is hard to say and may take some time. Please stay tuned ;D

Jul 06 '23 03:07 JingyeChen

It is hard to say and may take some time. Please stay tuned ;D

hello, can you provide the training code for LayoutTransformer? I found that the text bbox is 4 points, i.e. [(x0, y0), (x1, y1), (x2, y2), (x3, y3)], however, the released layout transformer use [x, y, x, y]. Can you provide more detail for training LayoutTransformer?

Jul 06 '23 03:07 crj1998

Thanks for your feedback. It is a mistake and the command should be:
img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64  --resize_mode=no
We will fix it in the readme file. Thanks!

Hi,

So, are the provided OCR labels measured in original resolution of the image?

Jul 07 '23 15:07 other-ones

Hi! Thanks for your good job! If i use this commond:"img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no", where should i resize the ori_image ? in the train.py ? (i find the size of the ori_image should be (512,512)). @JingyeChen

Jul 11 '23 17:07 RanJason-Code

We notice that there exist a few samples with mismatched annotations caused by the resize operation during releasing the datasets. We will fix it within one week. If you want to use this dataset urgently, please use the given character-level segmenter to check whether the result match the provided segmentation results.

Jul 11 '23 17:07 JingyeChen

Hi! Thanks for your good job! If i use this commond:"img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no", where should i resize the ori_image ? in the train.py ? (i find the size of the ori_image should be (512,512)). @JingyeChen

The problem is fixed. Please re-download the dataset using the link in README.md. After downloading, please resize every image to 512x512. Thanks!

Jul 13 '23 16:07 JingyeChen

@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?

Jul 13 '23 18:07 Question406

@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?

Hi, I'm also curious about this issue. I noticed that some of the coordinates for OCR labels exceeds 512, and I'm suspecting that the labels are not measured on 512X512 resolution. Are meta files also updated? Thanks

Jul 13 '23 22:07 koow-eat

The meta data, including detection and segmentation results, are updated. rec/det/seg are conducted with size 512x512. You can use np.clip(value, 0, 512) to clip the value.

@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?

Hi, I'm also curious about this issue. I noticed that some of the coordinates for OCR labels exceeds 512, and I'm suspecting that the labels are not measured on 512X512 resolution. Are meta files also updated? Thanks

Jul 14 '23 01:07 JingyeChen

The meta data, including detection and segmentation results, are updated. rec/det/seg are conducted with size 512x512. You can use np.clip(value, 0, 512) to clip the value.

@JingyeChen, Hi, are the image urls also updated? or it's just the meta data?

Hi, I'm also curious about this issue. I noticed that some of the coordinates for OCR labels exceeds 512, and I'm suspecting that the labels are not measured on 512X512 resolution. Are meta files also updated? Thanks

Can I use the previously downloaded files with just clipping? or should i redownload them?

Jul 14 '23 01:07 koow-eat

It is recommended to re-download it. Thanks!

Jul 14 '23 03:07 JingyeChen

Hi! Thanks for your good job! If i use this commond:"img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64 --resize_mode=no", where should i resize the ori_image ? in the train.py ? (i find the size of the ori_image should be (512,512)). @JingyeChen

The problem is fixed. Please re-download the dataset using the link in README.md. After downloading, please resize every image to 512x512. Thanks!

Thank you ! i'll give a try!

Jul 14 '23 13:07 RanJason-Code

Hi! I have something to confirm. After I redownload the metadata, should I put the maion_laion images into the corresponding directorys with their urls as the keys and resize them into 512*512 ? For the more , is the operation of resize in the preprocess_train function in train.py? @JingyeChen

Jul 14 '23 14:07 RanJason-Code

After I redownload the metadata, should I put the maion_laion images into the corresponding directorys with their urls as the keys and resize them into 512*512 ?

Yes, and you need to use resize each image to 512x512 during "putting the maion_laion images into the corresponding directorys" or perhaps add image = image.resize((512,512)) in train.py

Jul 14 '23 14:07 JingyeChen

I get it！Thank you a lot, Senior Chen!

Jul 14 '23 14:07 RanJason-Code

It is hard to say and may take some time. Please stay tuned ;D

hello, can you provide the training code for LayoutTransformer? I found that the text bbox is 4 points, i.e. [(x0, y0), (x1, y1), (x2, y2), (x3, y3)], however, the released layout transformer use [x, y, x, y]. Can you provide more detail for training LayoutTransformer?

Have u got the training code for Layouttransformer？

Oct 07 '23 10:10 nkjulia

Thanks for your feedback. It is a mistake and the command should be:
img2dataset --url_list=url.txt --output_folder=laion_ocr --thread_count=64  --resize_mode=no
We will fix it in the readme file. Thanks!

Why can't we use --resize_mode=512 directly, while must like the operation you mentioned that "after downloading, you need to resize each image to 512x512. Please follow mario-laion-index-url.txt to move each image to the corresponding folders."

By the way, could you please offer a script to "resize each image to 512x512 and move each image to the corresponding folders."

May 18 '24 10:05 Ruby-He

unilm unilm copied to clipboard

TextDiffusers: question about Mario-LAION annotations

unilm
unilm copied to clipboard