DocGeoNet icon indicating copy to clipboard operation
DocGeoNet copied to clipboard

Some questions about the textlines annotation process

Open KingRicardo opened this issue 2 years ago • 5 comments

Hello hao, Thanks for your awesome work for document image dewarping. Could you provide more details about the textlines annotation process? (e.g., the kernel size of binarization and dilation, and the filter rule)

KingRicardo avatar Dec 25 '22 05:12 KingRicardo

Hi, I am sorry for the late reply due to my health. I use the cv2.adaptiveThreshold for binarization as follows,

cv2.adaptiveThreshold(xxx, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV,ADAPTIVE_WINSZ, 25)

Besides, for dilation, the kernel size is 1 * 10 (h * w).

fh2019ustc avatar Jan 05 '23 08:01 fh2019ustc

Thanks for your reply. Hope you will get well soon :) I still have some questions about how you get the ADAPTIVE_WINSZ in cv2.adaptiveThreshold, and how to filter out non-textline connected regions?

KingRicardo avatar Jan 06 '23 05:01 KingRicardo

  ADAPTIVE_WINSZ=35
  width and height are the shape of textline candidate 
  if (width < 30) or (height < 2) or (width < 1.5*height):
      this is not a textline

Hope this helps.

fh2019ustc avatar Jan 08 '23 07:01 fh2019ustc

Thank you for sharing the experiment detail!

KingRicardo avatar Jan 09 '23 09:01 KingRicardo

@fh2019ustc I have a question about the localization step of the textlines annotation process. When creating textline masks, did you fill in all the pixels inside the bounding boxes? Or did you shrink the heights of the bounding boxes so that the textline masks only pass through the middle of the bounding boxes? example

Soongja avatar Jan 10 '23 09:01 Soongja