tesseract.js icon indicating copy to clipboard operation
tesseract.js copied to clipboard

Errors when using Buffer from Jimp

Open panstromek opened this issue 6 years ago • 6 comments

When I try to recognize buffer object that is created with Jimp I get errors.

Error in pixReadMem: Unknown format: no pix returned
Error in pixGetSpp: pix not defined
Error in pixGetDimensions: pix not defined
Error in pixGetColormap: pix not defined
Error in pixCopy: pixs not defined
Error in pixGetDepth: pix not defined
Error in pixGetWpl: pix not defined
Error in pixGetYRes: pix not defined
Error in pixClone: pixs not defined
Warning: Invalid resolution 0 dpi. Using 70 instead.
Error in pixClone: pixs not defined
Error in pixCopy: pixs not defined
tess->pix_binary() != nullptr:Error:Assert failed:in file /src/src/ccmain/osdetect.cpp, line 201
trap!

I guess this is because tesseract expects the Buffer to contain the full image file with metadata, while Buffer from Jimp is just a pure bitmap. In that case, it would be cool if Tesseract could operate on just bitmaps. If not, add warning for this in the docs ;)

I should also note that this worked fine in previous version of tesseract.js (1.0)

To Reproduce

  1. Create project with tesseract 2 and Jimp and an image file image.png.
  2. Create file
const { createWorker } = require('tesseract.js')
const Jimp = require('jimp')
const filename = 'image.png'
  
;(async () => {
  const image = await Jimp.read(filename)
  const worker = createWorker()
  await worker.load()
  await worker.loadLanguage('eng')
  await worker.initialize('eng')
  const result = await worker.recognize(image.bitmap.data)
  console.log(JSON.stringify(result.data))
  return worker.terminate()
})()
3. Run the file
  • OS: [Windows 10]
  • Env: [Node.js 12]
  • Version [2.0.0-beta.2]

panstromek avatar Nov 14 '19 10:11 panstromek

Workaround for now is to let jimp create buffer by mime type with getBuffer:

  const buffer = await image.getBufferAsync('image/png')
  const result = await worker.recognize(buffer)

This is not ideal though, because it requires creating another buffer needlessly.

panstromek avatar Nov 14 '19 11:11 panstromek

Now that I think about it, it can't work that way, because bitmap Buffer doesn't contain information about dimensions. Now I don't understand how I managed to make it work before in 1.0.

Anzway, If I could pass a buffer with width/height info to Tesseract, that would be awesome ;)

panstromek avatar Nov 14 '19 11:11 panstromek

One quick fix is to put your buffer into a canvas and pass the canvas to recognize() function, we will add this feature in the next release.

jeromewu avatar Dec 02 '19 12:12 jeromewu

is it aready fixed?

haji8-balaan avatar Jan 25 '21 08:01 haji8-balaan

@jeromewu Maybe it's pretty easy, would you please give the example code to convert the buffer to canvas? I did not found any proper solution. thanks in advance

emirom avatar Feb 17 '21 18:02 emirom

can you provide us with how to convert buffer data into a canvas

hussammaher avatar Jun 14 '22 12:06 hussammaher

Please disregard the comment above about converting to canvas. Canvas is an API native to browsers (not Node.js), and only the browser version of Tesseract.js seamlessly supports canvas inputs. It looks like jimp is a Node.js library.

Balearica avatar Aug 25 '22 02:08 Balearica

Regarding what images are supported: Tesseract.js does not support raw pixel data (which is returned by image.bitmap.data) as an input type, for either browser or Node. After reviewing the documentation, I agree that it is unclear on this point and should be clarified.

Regarding using Tesseract.js with Jimp: it looks like Jimp has multiple methods that transform the data into formats that Tesseract.js does accept. In fact, despite the follow-up comment, @panstromek's original suggested fix (using getBufferAsync) works perfectly well.

const result = await worker.recognize(await image.getBufferAsync(Jimp.MIME_PNG));

I think the cause of the confusion above is conflation of image formats with data types. Buffers that contain supported image formats (e.g. png) can be used, while buffers that do not contain supported image formats (e.g. the raw data in image.bitmap.data) are not supported.

Balearica avatar Sep 21 '22 00:09 Balearica

I updated the documentation to clarify what image formats/data types are supported. This includes adding following note to prevent this misunderstanding from occurring again.

Note: images must be a supported image format and a supported data type. For example, a buffer containing a png image is supported. A buffer containing raw pixel data is not supported.

Balearica avatar Sep 21 '22 01:09 Balearica