ocrs-models Text detection model performance

Hi, @robertknight I've been evaluating your text detection model "text-detection-checkpoint-ssfbcj81.pt" and found that its performance is lower than expected for my use case. Could you share more details about:

The specific datasets used for training this model
Whether the training process was completed to convergence
Any specific preprocessing or usage requirements I should be aware of to achieve optimal results

I'm trying to understand if there's something I've missed in my implementation or if the model has known limitations for certain types of documents or text characteristics.

May 07 '25 03:05 aidevveloper

The text detection threshold of this model is different than the default one. See https://github.com/robertknight/ocrs/discussions/160#discussioncomment-12939717. If using Ocrs, changing the threshold currently requires editing this value in the source: https://github.com/robertknight/ocrs/blob/0e85f3bace12b37b15b7b025c53c0d800caa23f0/ocrs/src/detection.rs#L33.

As for the training dataset and process, this is covered in the README: https://github.com/robertknight/ocrs-models?tab=readme-ov-file#datasets.

The metrics for the training run that produced this model are https://wandb.ai/robertknight/text-detection/runs/ssfbcj81?nw=nwuserrobertknight. Most of the metrics are pixel level so unfortunately not directly interpretable as "how well does text box extraction work". For future runs it would be better gather metrics that more directly reflect the final output after post-processing, or change the model architecture to more directly predict boxes instead of a segmentation mask.

May 07 '25 06:05 robertknight

Hi @robertknight, I followed what you said above. My result looks like as follows after 100 epochs:

'recall': '0.401', 'split_frac': '0.040', 'merged_frac': '0.089', 'precision': '0.443'

dataset: HierText

Did I do anything wrong or miss anything necessary?

Any suggestions would be highly appreciated

May 19 '25 01:05 aidevveloper

Did you start training from scratch or did you try to fine-tune an existing checkpoint?

It is helpful to visualize the outputs at different stages to understand errors better. The model itself outputs a pixel-level text/not-text probability. This is then thresholded to get a binary text/not-text classification. Finally post-processing finds connected components in the image and gets the minimum-area oriented bounding rectangles of those. You can get a feel for what these look like using the --text-map, --text-mask and --png flags of the Ocrs CLI tool (see ocrs --help for info).

If you are training or fine-tuning your own model, visualizing training progress using the Weights and Biases integration is helpful. You can check these metrics against previous training runs at https://wandb.ai/robertknight/text-detection?nw=nwuserrobertknight.

May 19 '25 07:05 robertknight

Hi @robertknight, Thank you for your quick reply I will follow what you suggested.

May 20 '25 05:05 aidevveloper