tesserocr icon indicating copy to clipboard operation
tesserocr copied to clipboard

Segmentation Fault(core dumped) when running WordFontAttribute with oem=0

Open vatsal28 opened this issue 7 years ago • 9 comments

I have the following piece of code which runs for all the images in a loop and gets the word font attributes for each. It's working fine for some set of images, but it crashes after sometime and gives the error:

Segmentation Fault(core dumped)

But if I run the code on just the image on which the loop crashed, it works and gives the output for some images, but again gives the same error for some. The code is :

for image in image_list:
	with PyTessBaseAPI(oem=0) as api:
		image1= PIL.Image.open(path+image)
		api.SetImage(image1)
		api.Recognize()
		iterator = api.GetIterator()
		if iterator.WordFontAttributes() is None:
			print "None"	

I'm running this on Ubuntu 14.04, with Python 2.7 and Tesseract 4.00

vatsal28 avatar Dec 27 '17 12:12 vatsal28

Please provide an example image in order to reproduce.

sirfz avatar Dec 29 '17 19:12 sirfz

It works fine with one single image, but when running on multiple images (scanning a folder with images to get the list and then running it) , it gives the error, so you can use any image you want, but it will work for a single image.

vatsal28 avatar Jan 02 '18 15:01 vatsal28

Since the code fails at some point in the loop, it's likely that the image at that specific step was the culprit so it would be good to identify this image and debug the problem with it.

sirfz avatar Jan 02 '18 16:01 sirfz

I thought so too, but then I removed that image and tried it again, still got the same error after some time. I've repeated this exercise multiple times, each time deleting the image where the crash happened. But there's not specific pattern in the images. Also, if I run this on just that image, it works. Is there anything I need to change in the code?

vatsal28 avatar Jan 03 '18 09:01 vatsal28

Try re-using the same API instead of re-initializing for each image:

with PyTessBaseAPI(oem=0) as api:
    for image in image_list:
	image1= PIL.Image.open(path+image)
	api.SetImage(image1)
	api.Recognize()
	iterator = api.GetIterator()
	if iterator.WordFontAttributes() is None:
		print "None"

sirfz avatar Jan 03 '18 17:01 sirfz

Same error again :(

vatsal28 avatar Jan 04 '18 09:01 vatsal28

Bump

vatsal28 avatar Jan 10 '18 09:01 vatsal28

I can't debug this as I have no way to reproduce it. I personally suspect it's specific images but according to you that's not the case which makes this a possible problem with tesseract itself or maybe your environment but I can't help with that.

sirfz avatar Jan 10 '18 17:01 sirfz

@vatsal28 try to use api.SetImageFile instead of using PIL.Image.open.

s-alexey avatar Feb 06 '18 10:02 s-alexey