tesserocr Segmentation Fault(core dumped) when running WordFontAttribute with oem=0

I have the following piece of code which runs for all the images in a loop and gets the word font attributes for each. It's working fine for some set of images, but it crashes after sometime and gives the error:

Segmentation Fault(core dumped)

But if I run the code on just the image on which the loop crashed, it works and gives the output for some images, but again gives the same error for some. The code is :

for image in image_list:
	with PyTessBaseAPI(oem=0) as api:
		image1= PIL.Image.open(path+image)
		api.SetImage(image1)
		api.Recognize()
		iterator = api.GetIterator()
		if iterator.WordFontAttributes() is None:
			print "None"

I'm running this on Ubuntu 14.04, with Python 2.7 and Tesseract 4.00

Dec 27 '17 12:12 vatsal28

Please provide an example image in order to reproduce.

Dec 29 '17 19:12 sirfz

It works fine with one single image, but when running on multiple images (scanning a folder with images to get the list and then running it) , it gives the error, so you can use any image you want, but it will work for a single image.

Jan 02 '18 15:01 vatsal28

Since the code fails at some point in the loop, it's likely that the image at that specific step was the culprit so it would be good to identify this image and debug the problem with it.

Jan 02 '18 16:01 sirfz

I thought so too, but then I removed that image and tried it again, still got the same error after some time. I've repeated this exercise multiple times, each time deleting the image where the crash happened. But there's not specific pattern in the images. Also, if I run this on just that image, it works. Is there anything I need to change in the code?

Jan 03 '18 09:01 vatsal28

Try re-using the same API instead of re-initializing for each image:

with PyTessBaseAPI(oem=0) as api:
    for image in image_list:
	image1= PIL.Image.open(path+image)
	api.SetImage(image1)
	api.Recognize()
	iterator = api.GetIterator()
	if iterator.WordFontAttributes() is None:
		print "None"

Jan 03 '18 17:01 sirfz

Same error again :(

Jan 04 '18 09:01 vatsal28

Bump

Jan 10 '18 09:01 vatsal28

I can't debug this as I have no way to reproduce it. I personally suspect it's specific images but according to you that's not the case which makes this a possible problem with tesseract itself or maybe your environment but I can't help with that.

Jan 10 '18 17:01 sirfz

@vatsal28 try to use api.SetImageFile instead of using PIL.Image.open.

Feb 06 '18 10:02 s-alexey

tesserocr tesserocr copied to clipboard

Segmentation Fault(core dumped) when running WordFontAttribute with oem=0

tesserocr
tesserocr copied to clipboard