sharp
sharp copied to clipboard
Best config for OCR-ready PNGs
Question about an existing feature
What are you trying to achieve?
We are trying to achieve the best possible (most accurate) OCR results. Images will be of invoices and receipts taken by users with their phones mostly.
We want to downscale unnecessary large images and try to reduce AI token usage by sending less pixels.
Please provide a minimal, standalone code sample, without other dependencies, that demonstrates this question
Current config:
export async function optimizeImage(buffer: Buffer): Promise<Buffer> {
const processedBuffer = await sharp(buffer)
.rotate()
.resize({
width: 2000,
height: 2000,
withoutEnlargement: true,
fit: 'inside',
})
.grayscale()
.normalise()
.sharpen({
sigma: 1.2,
m1: 0.5,
m2: 0.5,
})
.png()
.toBuffer()
return processedBuffer
}
These all look like good operations to try. The parameters you've chose to use with sharpen are typically more suitable for printing onto paper so its suitability will depend on how the "AI" model you're using has been trained. Perhaps also experiment with contrast limiting adaptive histogram equalization.