Best config for OCR-ready PNGs

Open hymair opened this issue 10 months ago • 1 comments

Question about an existing feature

What are you trying to achieve?

We are trying to achieve the best possible (most accurate) OCR results. Images will be of invoices and receipts taken by users with their phones mostly.

We want to downscale unnecessary large images and try to reduce AI token usage by sending less pixels.

Please provide a minimal, standalone code sample, without other dependencies, that demonstrates this question

Current config:

export async function optimizeImage(buffer: Buffer): Promise<Buffer> {
    const processedBuffer = await sharp(buffer)
        .rotate()
        .resize({
            width: 2000,
            height: 2000,
            withoutEnlargement: true,
            fit: 'inside',
        })
        .grayscale()
        .normalise()
        .sharpen({
            sigma: 1.2,
            m1: 0.5,
            m2: 0.5,
        })
        .png()
        .toBuffer()

    return processedBuffer
}

Apr 07 '25 19:04 hymair

These all look like good operations to try. The parameters you've chose to use with sharpen are typically more suitable for printing onto paper so its suitability will depend on how the "AI" model you're using has been trained. Perhaps also experiment with contrast limiting adaptive histogram equalization.

May 19 '25 08:05 lovell