doctr [reconstitution] Improve synthesize output quality

          @felixdittrich92 i have face result image is not upto quality...fonts are breaks in result image..


model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("bankstatement.pdf")
# Analyze
result = model(doc)
import matplotlib.pyplot as plt
plt.imshow(result.synthesize()[0]); plt.axis('off'); plt.show()

see the result image.. Figure_1

Originally posted by @tzktz in https://github.com/mindee/doctr/issues/1525#issuecomment-2020107199

Mar 26 '24 10:03 tzktz

Yeah we can maybe align the y-coords between line elements (words) and add some small horizontal default padding between detections CC @odulcy-mindee

Mar 26 '24 12:03 felixdittrich92

Yeah we can maybe align the y-coords between line elements (words) and add some small horizontal default padding between detections CC @odulcy-mindee

how to change the font_family ? @felixdittrich92

Mar 26 '24 12:03 tzktz

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)

Mar 26 '24 12:03 felixdittrich92

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)

synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

Mar 26 '24 12:03 tzktz

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)

synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

The font is installed on your system ?

Mar 26 '24 12:03 felixdittrich92

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)

synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

The font is installed on your system ?

yes i have that font in my project folder.. @felixdittrich92

Mar 27 '24 05:03 tzktz

result.synthesize(font_family="XYZ")

under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)

synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()

same warning even i pass the font_family.. @felixdittrich92

WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.

The font is installed on your system ?

yes i have that font in my project folder.. @felixdittrich92

Ah ok got it that's not enough you need to install the font on your system : https://linuxiac.com/how-to-install-fonts-on-linux/#:~:text=Go%20to%20%E2%80%9CSystem%20Settings%E2%80%9D%20%3E,%E2%80%9CInstall%20from%20File%E2%80%9D%20button.&text=Then%20select%20the%20font%20files,%2Dwide%20or%20per%2Duser.

Mar 27 '24 05:03 felixdittrich92

see the below input and output results.. result image quality is very poor.. pixels were broken @felixdittrich92 input image..(1240 x 1754) 158.44kb input

result image..(1907 x 965) 46kb Figure_1

Mar 27 '24 06:03 tzktz

@felixdittrich92 any update?

Apr 02 '24 11:04 tzktz

Hi @tzktz :wave:,

Unfortunately i don't have the time to work on that at the moment, so we need to address this later on or you work on that if you want (feel free to open a PR)

related code can be found at: https://github.com/mindee/doctr/blob/0d849152a852cb55b3fb0cc0e5e602600349d97d/doctr/utils/visualization.py#L291

Best regards, Felix

Apr 02 '24 11:04 felixdittrich92

@felixdittrich92 Hey, sorry for being MIA. I needed to take some time off. I am back now and I was hoping I could take up this issue? Let me know regarding this :)

Apr 27 '24 12:04 SkaarFacee

@felixdittrich92 Hey, sorry for being MIA. I needed to take some time off. I am back now and I was hoping I could take up this issue? Let me know regarding this :)

Hey @SkaarFacee 👋 Sure feel free to work on it 😊 The code moved a bit it is now in: https://github.com/mindee/doctr/blob/main/doctr/utils/reconstitution.py

Apr 27 '24 12:04 felixdittrich92

Okay thanks. Let me take a look on what I can do

Apr 27 '24 13:04 SkaarFacee

@felixdittrich92 Do you have any suggestions on how I can improve the quality of the image ?

Apr 29 '24 12:04 SkaarFacee

@SkaarFacee One thing we could do is if we have the line box information we could align all boxes inside to the line y coordinate (to become a more straight view) I found the following hf space the reconstitution looks not bad maybe you can use it as reference or to get some inspiration ^^ : https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py

Apr 29 '24 13:04 felixT2K

Okay, I will take a look and see what can be done over the weekend. This doesn't look that complex at quick glance :)

May 02 '24 09:05 SkaarFacee

Okay, I will take a look and see what can be done over the weekend. This doesn't look that complex at quick glance :)

Yeah i think too :)

May 02 '24 09:05 felixdittrich92

Hey, I am working on on this, sorry for the delay. Something came up at work and got me busy

May 08 '24 04:05 SkaarFacee

@felixT2K I was using the link you mentioned as reference (https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py) I can't exactly pin point the place where the y coordinate was used to to align the line. If the goal is to straighten the line, why don't we make the y coordinates of each box in the line the same using a mathematical approach ( using the mean or centroid of the boxes). If you could maybe give me more insights on the hf reference I would gladly implement that as too 😄

May 11 '24 18:05 SkaarFacee

@felixT2K I was using the link you mentioned as reference (https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py) I can't exactly pin point the place where the y coordinate was used to to align the line. If the goal is to straighten the line, why don't we make the y coordinates of each box in the line the same using a mathematical approach ( using the mean or centroid of the boxes). If you could maybe give me more insights on the hf reference I would gladly implement that as too 😄

Correct that was what i have had in mind you know which boxes are in one line (of `resolve_lines=True otherwise if only one line element available keep the y coords of each box) then take the lines y coordinate for each box to straighten the boxes on the line :)

May 11 '24 18:05 felixdittrich92

@SkaarFacee any updates ? :hugs:

Jun 06 '24 05:06 felixdittrich92

Hey, I am soo sorry. I had some personal issues that needed some time to be resolved. Everything is fine now and back to normal. I shall give you an update asap now 😓 😞

Jun 09 '24 07:06 SkaarFacee

Hey, I am soo sorry. I had some personal issues that needed some time to be resolved. Everything is fine now and back to normal. I shall give you an update asap now 😓 😞

No problem :)

In the meanwhile i identified the root issue.

Code itself works as expected
Align to line geometry can be added easily
issue: font size computation isn't correct (it should be computed depending on the geometry size in respect to the page size) - that's the reason why it looks sometimes gibberish

Jun 09 '24 11:06 felixdittrich92

Regarding the font size computation, should we maybe add a limit to the font to page size ratio so that the issue becomes less frequent ? Also regarding the line geometry alignment, I think maybe we can also add an tolerance value of some sort so that words that have y pixel values as 40 and say 43 come together in a straight line. ( Here I assume the tolerance is of +-5)

Jun 10 '24 08:06 SkaarFacee

Hey @SkaarFacee :wave:,

point for the line alignment sounds good to me :+1:
about the font size i think we need some logic to get the ratio of a geometry to the page size and then find some calculation to compute the font size or we do a mapping depending on the ratio to some fixed font size values which fits well wdyt ?

Jun 11 '24 06:06 felixdittrich92

about the font size i think we need some logic to get the ratio of a geometry to the page size and then find some calculation to compute the font size or we do a mapping depending on the ratio to some fixed font size values which fits well wdyt ?

Yes, do we have some sample images so to get started with in this case ?

Jun 13 '24 09:06 SkaarFacee

about the font size i think we need some logic to get the ratio of a geometry to the page size and then find some calculation to compute the font size or we do a mapping depending on the ratio to some fixed font size values which fits well wdyt ?

Yes, do we have some sample images so to get started with in this case ?

Yes give me few minutes i can attach some

Jun 13 '24 09:06 felixdittrich92

Here you go :) samples.zip

Jun 13 '24 09:06 felixdittrich92

Are these samples that need to be improved or a combination of good outputs and bad outputs ?

Jun 14 '24 23:06 SkaarFacee

Are these samples that need to be improved or a combination of good outputs and bad outputs ?

Only some samples for testing :)

Jun 15 '24 04:06 felixdittrich92

doctr doctr copied to clipboard

[reconstitution] Improve synthesize output quality

doctr
doctr copied to clipboard