doctr icon indicating copy to clipboard operation
doctr copied to clipboard

Detection prefers to include dot's of i's underneath

Open rmast opened this issue 1 year ago • 0 comments

Bug description

When I make the detector detect text in the following image Brief gemeente 300dpi voorkant the preferred dots in the boxes are from the i's in the lines below the boxes (see the error traceback picture)

Code snippet to reproduce the bug

import matplotlib.pyplot as plt
import matplotlib.patches as patches
from doctr.models import detection_predictor
from doctr.io import DocumentFile



def visualize_word_boxes(image_path, word_boxes):
    # Load the image
    image = plt.imread(image_path)

    # Get image dimensions
    image_height, image_width, _ = image.shape

    # Create figure and axes
    fig, ax = plt.subplots()
    ax.imshow(image)

    # Plot word boxes
    for box in word_boxes:
        # Convert normalized coordinates to absolute pixel values
        x1 = int(box[0] * image_width)
        y1 = int(box[1] * image_height)
        x2 = int(box[2] * image_width)
        y2 = int(box[3] * image_height)

        # Create a rectangle patch
        rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=1, edgecolor='r', facecolor='none')

        # Add the patch to the Axes
        ax.add_patch(rect)

    # Show the plot
    plt.show()

# Assuming 'doc' contains the loaded image and 'result' contains the word boxes
image_path = "/home/rmast/Downloads/Brief gemeente 300dpi voorkant.jpg"

# Assuming 'result' contains the detection results
model = detection_predictor(arch='db_resnet50', pretrained=True)
doc = DocumentFile.from_images("/home/rmast/Downloads/Brief gemeente 300dpi voorkant.jpg")
result = model(doc)
word_boxes = result[0]['words']  # Assuming 'words' contains the word boxes
visualize_word_boxes(image_path, word_boxes)

Error traceback

Detected boxes See the box "Op meerdere plaatsen [op]" [op] contains a dot from below. "kruisingen [op] het Kerkplein" This [op] also contains a dot from below. "We [gaan] de kruisingen" This [gaan] also has a dot from below. It appears the descenders of p and g increase the risk of this happening.

Environment

DocTR version: v0.8.1 TensorFlow version: N/A PyTorch version: 2.2.2 (torchvision 0.17.2) OpenCV version: 4.9.0 OS: Linux Mint 20.3 Python version: 3.12.3 Is CUDA available (TensorFlow): N/A Is CUDA available (PyTorch): Yes CUDA runtime version: 12.1.66 GPU models and configuration: GPU 0: NVIDIA GeForce GT 1030 Nvidia driver version: 535.86.05 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.3 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.3

Deep Learning backend

Python 3.12.3 | packaged by Anaconda, Inc. | (main, Apr 19 2024, 16:50:38) [GCC 11.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

from doctr.file_utils import is_tf_available, is_torch_available

print(f"is_tf_available: {is_tf_available()}") is_tf_available: False print(f"is_torch_available: {is_torch_available()}") is_torch_available: True

rmast avatar Apr 22 '24 18:04 rmast