gosseract icon indicating copy to clipboard operation
gosseract copied to clipboard

Heroku tesseract build pack support?

Open ansonl opened this issue 7 years ago • 3 comments

Is there a working configuration to get this working with one of the Heroku tesseract buildpacks such as https://github.com/Dkevs/heroku-buildpack-tesseract?

When compiling go app, Heroku gives error

tessbridge.cpp:5:31: fatal error: tesseract/baseapi.h: No such file or directory
remote: compilation terminated.

I've tried setting CGO_CFLAGS in heroku like heroku config:set CGO_CFLAGS='-I ${build_dir}/tesseract/../' to no avail.

I see the example heroku project uses docker and installs libtesseract-dev. Wondering if gosseract is only tested with docker and if you can recommend a buildpack for libtesseract-dev.

ansonl avatar Apr 15 '18 05:04 ansonl

Hi, @ansonl

First, have you tried LD_LIBRARY_PATH?

I personally recommend using Docker for your heroku application because it's more flexible and easy to handle.

I'm gonna try buildpack when I have a time.

otiai10 avatar Apr 15 '18 06:04 otiai10

Unfortunately I wasn't able to get the libtesseract-dev buildpack working. I ended up just calling the tesseract command through os.Exec for a project that uses tesseract a couple hundred times.

This also confirmed what seems to be a memory leak issue in the Tesseract BaseAPI.End() function. When calling the End() function and letting client go out of scope, memory usage decreases slightly, but still takes up a couple megabytes of memory for each client struct created.

This can be seen by running the below test program:

package main

import (
	"fmt"
	"github.com/otiai10/gosseract"
	"time"
)

func main() {

	var count int

	var clients []*gosseract.Client

	for _ = range time.Tick(time.Millisecond*100) {
		client := gosseract.NewClient()
	
	
	client.SetImage("002-confusing.png")
	text, _ := client.Text()
	_, _ = client.HOCRText()
	fmt.Println(text)
	// Hello, World!
	count++
	
	clients = append(clients, client)

	if count == 20 {
		break
	}
	}

	for _ = range time.Tick(time.Millisecond*10) {
		count--
		if count == 0 {
			break
		}
	}

	for _, c := range clients {
		(*c).Close()
	}

	for _ = range time.Tick(time.Second*1) {
	}
}

For some reason, after the clients are closed and the tesseract BaseAPI End() method should be called, memory usage will remain elevated. I have tried calling the Go garbage collector functions and it seems to make no difference. The only way I have found to release the memory is to exit the program.

I looked through the .cpp file and have not seen any bugs, so this may be a tesseract library issue.

ansonl avatar Apr 24 '18 01:04 ansonl

@ansonl Thank you. Do you mind separating issues please?

otiai10 avatar Apr 24 '18 01:04 otiai10