doctr
doctr copied to clipboard
[reconstitution] Improve synthesize output quality
@felixdittrich92 i have face result image is not upto quality...fonts are breaks in result image..
model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("bankstatement.pdf")
# Analyze
result = model(doc)
import matplotlib.pyplot as plt
plt.imshow(result.synthesize()[0]); plt.axis('off'); plt.show()
see the result image..
Originally posted by @tzktz in https://github.com/mindee/doctr/issues/1525#issuecomment-2020107199
Yeah we can maybe align the y-coords between line elements (words) and add some small horizontal default padding between detections CC @odulcy-mindee
Yeah we can maybe align the y-coords between line elements (words) and add some small horizontal default padding between detections CC @odulcy-mindee
how to change the font_family ? @felixdittrich92
result.synthesize(font_family="XYZ")
under the hood calls PIL:
font = ImageFont.truetype(font_family, font_size)
result.synthesize(font_family="XYZ") under the hood calls PIL: font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13)
plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()
same warning even i pass the font_family.. @felixdittrich92
WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.
result.synthesize(font_family="XYZ") under the hood calls PIL: font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13) plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()
same warning even i pass the font_family.. @felixdittrich92
WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.
The font is installed on your system ?
result.synthesize(font_family="XYZ") under the hood calls PIL: font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13) plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()
same warning even i pass the font_family.. @felixdittrich92
WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.
The font is installed on your system ?
yes i have that font in my project folder.. @felixdittrich92
result.synthesize(font_family="XYZ") under the hood calls PIL: font = ImageFont.truetype(font_family, font_size)
synthetic_pages = result.synthesize(font_family='Arial.ttf', font_size=13) plt.imshow(synthetic_pages[0]); plt.axis('off'); plt.show()
same warning even i pass the font_family.. @felixdittrich92
WARNING:root:unable to load recommended font family. Loading default PIL font,font size issues may be expected.To prevent this, it is recommended to specify the value of 'font_family'.
The font is installed on your system ?
yes i have that font in my project folder.. @felixdittrich92
Ah ok got it that's not enough you need to install the font on your system : https://linuxiac.com/how-to-install-fonts-on-linux/#:~:text=Go%20to%20%E2%80%9CSystem%20Settings%E2%80%9D%20%3E,%E2%80%9CInstall%20from%20File%E2%80%9D%20button.&text=Then%20select%20the%20font%20files,%2Dwide%20or%20per%2Duser.
see the below input and output results.. result image quality is very poor.. pixels were broken @felixdittrich92
input image..(1240 x 1754) 158.44kb
result image..(1907 x 965) 46kb
@felixdittrich92 any update?
Hi @tzktz :wave:,
Unfortunately i don't have the time to work on that at the moment, so we need to address this later on or you work on that if you want (feel free to open a PR)
related code can be found at: https://github.com/mindee/doctr/blob/0d849152a852cb55b3fb0cc0e5e602600349d97d/doctr/utils/visualization.py#L291
Best regards, Felix
@felixdittrich92 Hey, sorry for being MIA. I needed to take some time off. I am back now and I was hoping I could take up this issue? Let me know regarding this :)
@felixdittrich92 Hey, sorry for being MIA. I needed to take some time off. I am back now and I was hoping I could take up this issue? Let me know regarding this :)
Hey @SkaarFacee 👋 Sure feel free to work on it 😊 The code moved a bit it is now in: https://github.com/mindee/doctr/blob/main/doctr/utils/reconstitution.py
Okay thanks. Let me take a look on what I can do
@felixdittrich92 Do you have any suggestions on how I can improve the quality of the image ?
@SkaarFacee One thing we could do is if we have the line box information we could align all boxes inside to the line y coordinate (to become a more straight view) I found the following hf space the reconstitution looks not bad maybe you can use it as reference or to get some inspiration ^^ : https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py
Okay, I will take a look and see what can be done over the weekend. This doesn't look that complex at quick glance :)
Okay, I will take a look and see what can be done over the weekend. This doesn't look that complex at quick glance :)
Yeah i think too :)
Hey, I am working on on this, sorry for the delay. Something came up at work and got me busy
@felixT2K I was using the link you mentioned as reference (https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py) I can't exactly pin point the place where the y coordinate was used to to align the line. If the goal is to straighten the line, why don't we make the y coordinates of each box in the line the same using a mathematical approach ( using the mean or centroid of the boxes). If you could maybe give me more insights on the hf reference I would gladly implement that as too 😄
@felixT2K I was using the link you mentioned as reference (https://huggingface.co/spaces/SWHL/RapidOCRDemo/blob/main/utils.py) I can't exactly pin point the place where the y coordinate was used to to align the line. If the goal is to straighten the line, why don't we make the y coordinates of each box in the line the same using a mathematical approach ( using the mean or centroid of the boxes). If you could maybe give me more insights on the hf reference I would gladly implement that as too 😄
Correct that was what i have had in mind you know which boxes are in one line (of `resolve_lines=True otherwise if only one line element available keep the y coords of each box) then take the lines y coordinate for each box to straighten the boxes on the line :)
@SkaarFacee any updates ? :hugs:
Hey, I am soo sorry. I had some personal issues that needed some time to be resolved. Everything is fine now and back to normal. I shall give you an update asap now 😓 😞
Hey, I am soo sorry. I had some personal issues that needed some time to be resolved. Everything is fine now and back to normal. I shall give you an update asap now 😓 😞
No problem :)
In the meanwhile i identified the root issue.
- Code itself works as expected
- Align to line geometry can be added easily
- issue: font size computation isn't correct (it should be computed depending on the geometry size in respect to the page size) - that's the reason why it looks sometimes gibberish
Regarding the font size computation, should we maybe add a limit to the font to page size ratio so that the issue becomes less frequent ? Also regarding the line geometry alignment, I think maybe we can also add an tolerance value of some sort so that words that have y pixel values as 40 and say 43 come together in a straight line. ( Here I assume the tolerance is of +-5)
Hey @SkaarFacee :wave:,
- point for the line alignment sounds good to me :+1:
- about the font size i think we need some logic to get the ratio of a geometry to the page size and then find some calculation to compute the font size or we do a mapping depending on the ratio to some fixed font size values which fits well wdyt ?
about the font size i think we need some logic to get the ratio of a geometry to the page size and then find some calculation to compute the font size or we do a mapping depending on the ratio to some fixed font size values which fits well wdyt ?
Yes, do we have some sample images so to get started with in this case ?
about the font size i think we need some logic to get the ratio of a geometry to the page size and then find some calculation to compute the font size or we do a mapping depending on the ratio to some fixed font size values which fits well wdyt ?
Yes, do we have some sample images so to get started with in this case ?
Yes give me few minutes i can attach some
Here you go :) samples.zip
Are these samples that need to be improved or a combination of good outputs and bad outputs ?
Are these samples that need to be improved or a combination of good outputs and bad outputs ?
Only some samples for testing :)