Detection prefers to include dot's of i's underneath
Bug description
When I make the detector detect text in the following image
the preferred dots in the boxes are from the i's in the lines below the boxes (see the error traceback picture)
Code snippet to reproduce the bug
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from doctr.models import detection_predictor
from doctr.io import DocumentFile
def visualize_word_boxes(image_path, word_boxes):
# Load the image
image = plt.imread(image_path)
# Get image dimensions
image_height, image_width, _ = image.shape
# Create figure and axes
fig, ax = plt.subplots()
ax.imshow(image)
# Plot word boxes
for box in word_boxes:
# Convert normalized coordinates to absolute pixel values
x1 = int(box[0] * image_width)
y1 = int(box[1] * image_height)
x2 = int(box[2] * image_width)
y2 = int(box[3] * image_height)
# Create a rectangle patch
rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=1, edgecolor='r', facecolor='none')
# Add the patch to the Axes
ax.add_patch(rect)
# Show the plot
plt.show()
# Assuming 'doc' contains the loaded image and 'result' contains the word boxes
image_path = "/home/rmast/Downloads/Brief gemeente 300dpi voorkant.jpg"
# Assuming 'result' contains the detection results
model = detection_predictor(arch='db_resnet50', pretrained=True)
doc = DocumentFile.from_images("/home/rmast/Downloads/Brief gemeente 300dpi voorkant.jpg")
result = model(doc)
word_boxes = result[0]['words'] # Assuming 'words' contains the word boxes
visualize_word_boxes(image_path, word_boxes)
Error traceback
See the box "Op meerdere plaatsen [op]" [op] contains a dot from below.
"kruisingen [op] het Kerkplein" This [op] also contains a dot from below.
"We [gaan] de kruisingen" This [gaan] also has a dot from below.
It appears the descenders of p and g increase the risk of this happening.
Environment
DocTR version: v0.8.1 TensorFlow version: N/A PyTorch version: 2.2.2 (torchvision 0.17.2) OpenCV version: 4.9.0 OS: Linux Mint 20.3 Python version: 3.12.3 Is CUDA available (TensorFlow): N/A Is CUDA available (PyTorch): Yes CUDA runtime version: 12.1.66 GPU models and configuration: GPU 0: NVIDIA GeForce GT 1030 Nvidia driver version: 535.86.05 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.3
Deep Learning backend
Python 3.12.3 | packaged by Anaconda, Inc. | (main, Apr 19 2024, 16:50:38) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.
from doctr.file_utils import is_tf_available, is_torch_available
print(f"is_tf_available: {is_tf_available()}") is_tf_available: False print(f"is_torch_available: {is_torch_available()}") is_torch_available: True