text-detection-ctpn icon indicating copy to clipboard operation
text-detection-ctpn copied to clipboard

Questions about training data

Open Sanster opened this issue 7 years ago • 24 comments
trafficstars

Thanks for your code and dataset!

I have two questions about training data:

  1. Some images' short side length is not equal to 600(e.g. img_1651), but the label's width is 16, so when get_minibatch , each box's width in gt_boxes will not equal to 16. Will this behavior affect the text detect performance?
  2. Should we remove labels like following (red part) to get better performance?

img_1652.jpg img_1652

img_1676.jpg 123

Sanster avatar Aug 06 '18 08:08 Sanster

There are also some images have wrong labels, like img_3591.jpg

image

img_5737.jpg and img_5741.jpg

update: These images are wrong labeled in original MLT17 training dataset

Sanster avatar Aug 07 '18 01:08 Sanster

I am trying to clean the data and recreate the anchor labels from MLT17 according to the minAreaRect of a text line. Not sure whether the training result will be better or not, but I think it worth a try. I will release the cleaned data at tf_ctpn once finished.

Split text line by bounding box: 1036_old

Split text line by minAreaRect: 1036

Sanster avatar Aug 10 '18 03:08 Sanster

After recreate the ground truth labels and make several changes (see https://github.com/Sanster/tf_ctpn/commit/dc533e030e5431212c1d4dbca0bcd7e594a8a368 and https://github.com/Sanster/tf_ctpn/commit/7ae3d50d72bbdccb16f00987a5edb97659d6fbf2), I got better result on ICDAR13:

Net Dataset Recall Precision Hmean
Origin CTPN ICDAR13 + ? 73.72% 92.77% 82.15%
vgg16 without commit https://github.com/Sanster/tf_ctpn/commit/dc533e030e5431212c1d4dbca0bcd7e594a8a368 and https://github.com/Sanster/tf_ctpn/commit/7ae3d50d72bbdccb16f00987a5edb97659d6fbf2 data provided by @eragonruan 63.69% 71.46 % 67.35%
vgg16 with commit https://github.com/Sanster/tf_ctpn/commit/dc533e030e5431212c1d4dbca0bcd7e594a8a368 without commit https://github.com/Sanster/tf_ctpn/commit/7ae3d50d72bbdccb16f00987a5edb97659d6fbf2 data provided by @eragonruan 69.70% 70.10% 69.90%
vgg16 MLT17 latin/chn new ground truth + icdar13 training data 74.26% 82.46% 78.15%

Sanster avatar Aug 13 '18 03:08 Sanster

@Sanster good job!

eragonruan avatar Aug 15 '18 02:08 eragonruan

@Sanster why the score of last line (VGG 16) is worse than the first line (Origin CTPN), i.e.

74.26% | 82.46% | 78.15% v.s. 73.72% | 92.77% | 82.15% (recall, precision, Hmean) ?

interxuxing avatar Aug 17 '18 08:08 interxuxing

@interxuxing Maybe

  • No side-refinement part
  • Different way from Conv5 to BLSTM see https://github.com/eragonruan/text-detection-ctpn/issues/193
  • The training data is different
  • Use adam. Origin CTPN use SGD
  • ...

Sanster avatar Aug 17 '18 09:08 Sanster

@Sanster Thank you for your prompt reply. I figured out that there are several differences between this implementation and the original paper.

As you have explored, the MTL datasets is not very clean/accurate for training. Do you think using the synthesized data such as SynthText is useful for better performance, though the text are synthesized and embedded in some template image.

interxuxing avatar Aug 17 '18 09:08 interxuxing

@interxuxing I think it worth a try

Sanster avatar Aug 21 '18 02:08 Sanster

Sanster avatar Aug 21 '18 02:08 Sanster

"origin CTPN" means the model release by paper, or caffe repo

saicoco avatar Sep 11 '18 02:09 saicoco

@saicoco The result of "origin CTPN" is from ICDAR13 result page

Sanster avatar Sep 11 '18 04:09 Sanster

ok. get it

saicoco avatar Sep 11 '18 05:09 saicoco

@Sanster I have a problem for it, if you remove labels like following (red part) , In the process of PRN, Those small text may be selected as Negative samples, so that is there a problem for training CTPN?

Wangweilai1 avatar Sep 11 '18 08:09 Wangweilai1

@Wangweilai1 I think vertical words(not suitable for CTPN), very small words(can't recognize by human) should be negative examples, or we can create a ignore mask like in EAST. Not sure which way is better.

Sanster avatar Sep 18 '18 07:09 Sanster

Yea,may be create a ignore mask is a good idea, In order to avoid text be selected as negative. Thanks for you answer!

Wangweilai1 avatar Sep 18 '18 09:09 Wangweilai1

Hi @Sanster @eragonruan

Could you please share how you guys draw the ground truth boxes on training image? I am analyzing the difference between this model and CTPN original model. Zhi Tian(CTPN author) suggested me to check your dataset, maybe the ground truth is too wider than text content. Thanks.

Nic-Ma avatar Sep 28 '18 04:09 Nic-Ma

@Sanster can you tell me how to calculate the precision and recall? thanks !

boris-lb avatar Oct 26 '18 08:10 boris-lb

@Sanster @eragonruan 你好,我使用的自己的数据集,手写体,且文字方向不固定,检测结果效果不好,特别是竖直文字,想问一下,我如何修改算法能够适合竖直文字的检测?或者有没有更好的算法,多谢

snowwindy avatar Nov 16 '18 07:11 snowwindy

default 这个怎么改?好奇怪

magicxiaobai avatar Jan 05 '19 05:01 magicxiaobai

@Sanster @eragonruan 你好,我使用的自己的数据集,手写体,且文字方向不固定,检测结果效果不好,特别是竖直文字,想问一下,我如何修改算法能够适合竖直文字的检测?或者有没有更好的算法,多谢

ctpn只是用来检测水平文字的

northeastsquare avatar Feb 27 '19 08:02 northeastsquare

default 这个怎么改?好奇怪

数据集有问题,把对应错误的变量打印出来,检查一下

northeastsquare avatar Feb 27 '19 08:02 northeastsquare

Hi , How did you annotate your own data while preparing the training dataset ?

hsonetta avatar Mar 05 '19 06:03 hsonetta

I am trying to clean the data and recreate the anchor labels from MLT17 according to the minAreaRect of a text line. Not sure whether the training result will be better or not, but I think it worth a try. I will release the cleaned data at tf_ctpn once finished.

Split text line by bounding box: 1036_old

Split text line by minAreaRect: 1036 下载不了谷歌网盘的文件,所以找不到清理之后的数据,请问,可以提供吗?

magicxiaobai avatar May 08 '19 03:05 magicxiaobai

@Sanster can you tell me how to calculate the precision and recall? thanks !

Have you solve the problem? I have the same questions, could you share the methods?

chiyandetaotie avatar Sep 14 '20 07:09 chiyandetaotie