sbb_binarization icon indicating copy to clipboard operation
sbb_binarization copied to clipboard

Training (or Fine-Tuning) the Model

Open martholomew opened this issue 2 years ago • 1 comments

I would like to fine-tune the model towards the data that I will be feeding it. My pipeline would be to binarize the images using sbb_binarize, then manually edit them to be high-quality ground-truth, then feed a large amount of these images back into the model.

  1. Would the end-result be better binarization on my dataset?
  2. How would this be accomplished?

A link to point me in the right direction would be a great help.

martholomew avatar Oct 14 '23 10:10 martholomew

Dear @martholomew,

Of course, Pseudo-labeling can be effective, and we have also utilized this technique to enhance our models. You can employ https://github.com/qurator-spk/sbb_pixelwise_segmentation for your training needs. Initially, you can use our models to binarize your dataset and subsequently choose the documents with satisfactory results for custom dataset training. Sometimes, the predictions may exhibit local excellence. In such cases, you can employ cropping to prepare your ground truth (GT).

vahidrezanezhad avatar Oct 17 '23 11:10 vahidrezanezhad