ccextractor icon indicating copy to clipboard operation
ccextractor copied to clipboard

[BUG] Not enough memory to initialize Tesseract

Open Aradmey opened this issue 7 years ago • 19 comments

CCExtractor version (using the --version parameter preferably) : 0.85

In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):

  • [x] I have read and understood the contributors guide.
  • [X] I have checked that the bug-fix I am reporting can be replicated, or that the feature I am suggesting isn't already present.
  • [X] I have checked that the issue I'm posting isn't already reported.
  • [X] I have checked that the issue I'm porting isn't already solved and no duplicates exist in closed issues and in opened issues
  • [X] I have used the latest available version of CCExtractor to verify this issue exists.

My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):

  • [X] I have never used CCExtractor.

Necessary information

  • Is this a regression (did it work before)? [X] NO
  • What platform did you use? [X] Windows
  • What were the used arguments? E:\CCExtractor\ccextractorwinfull.exe --gui_mode_reports -in=mp4 -autoprogram -out=srt -bom -unicode -hardsubx -subcolor white -conf_thresh 60 [+input files]

Additional information

Hello, I've tried using the program to extract burned-in subtitles from a .mp4 movie, but it seems to always show me this error: "Not enough memory to initialize Tesseract!" Is there any solution known for this issue?

Aradmey avatar Oct 22 '18 15:10 Aradmey

It seems your computer doesn't have the power to run Tesseract. Therefore there isn't any issue with CCExtractor but with your computer running it.

MatejMecka avatar Oct 22 '18 20:10 MatejMecka

Could you please post complete logs along with procedure you followed to compile CCExtractor?

On Mon, Oct 22, 2018 at 8:34 PM Aradmey [email protected] wrote:

CCExtractor version (using the --version parameter preferably) : 0.85

In raising this issue, I confirm the following (please check boxes, eg [X] - and delete unchecked ones):

My familiarity with the project is as follows (check one, eg [X] - and delete unchecked ones):

  • I have never used CCExtractor.

Necessary information

  • Is this a regression (did it work before)? [X] NO
  • What platform did you use? [X] Windows
  • What were the used arguments? E:\CCExtractor\ccextractorwinfull.exe --gui_mode_reports -in=mp4 -autoprogram -out=srt -bom -unicode -hardsubx -subcolor white -conf_thresh 60 [+input files]

Additional information

Hello, I've tried using the program to extract burned-in subtitles from a .mp4 movie, but it seems to always show me this error: "Not enough memory to initialize Tesseract!" Is there any solution known for this issue?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CCExtractor/ccextractor/issues/1008, or mute the thread https://github.com/notifications/unsubscribe-auth/AL1y1HrN0ugJUjFZqFlNU1oLNOsG2id3ks5und5sgaJpZM4XzjNv .

--

Saurabh Shrivastava

saurabhshri avatar Oct 22 '18 20:10 saurabhshri

It seems your computer doesn't have the power to run Tesseract. Therefore there isn't any issue with CCExtractor but with your computer running it.

I doubt it, as my PC has 16GB.

Could you please post complete logs along with procedure you followed to compile CCExtractor? -- - Saurabh Shrivastava

I did not compile CCExtractor, I downloaded the binaries (GUI and command line programs) and later the installer itself. None of them worked.. All I did is open CCExtractor, selected my file, selected "With OCR" below, ticked "Perform burned-in subtitle extraction", started and received the mentioned error.

Aradmey avatar Oct 22 '18 22:10 Aradmey

This is also happening on Ubuntu 18.04 with ccextractor compiled from master using the tesseract from the normal repos. If I manually extract images using ffmpeg and run tesseract then there is no complaining about memory on my 8GB Dell XPS laptop.

Now my C++ is almost non-existent but looking at the tesseract code looks like it may have nothing to do with memory at all. https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/hardsubx.c#L238 Assumes that any non-zero return value means "Not enough memory to intialize Tesseract" but I don't see anything in https://github.com/tesseract-ocr/tesseract/blob/master/src/api/capi.cpp#L241 or https://github.com/tesseract-ocr/tesseract/blob/master/src/api/baseapi.h#L189 that suggest that non-zero is guaranteed to be memory related. It simply says:

  • Start tesseract. Returns zero on success and -1 on failure.
  • NOTE that the only members that may be called before Init are those
  • listed above here in the class definition.

I may well not be looking at the right place but it seems to me that this could well be something other than insufficient memory.

AntonOfTheWoods avatar Nov 17 '18 02:11 AntonOfTheWoods

@saurabhshri , do you have any ideas about this? Am I completely wrong in my interpretation of the code?

AntonOfTheWoods avatar Nov 26 '18 10:11 AntonOfTheWoods

@AntonOfTheWoods No, you're not. It's not your machine. It has been reported previously, but they were able to solve it. Happy debugging :)

saurabhshri avatar Nov 26 '18 11:11 saurabhshri

OK let's try to figure this one out... @Aradmey first, does it happen with all files or just some, or a specific hone? Can you share one?

Have you tried in 0.87?

cfsmp3 avatar Nov 26 '18 16:11 cfsmp3

@Aradmey , was it you that was able to solve it or did you abandon CCExtractor?

AntonOfTheWoods avatar Nov 30 '18 04:11 AntonOfTheWoods

@cfsmp3 , I am using master rather than an official release version (like 0.8.7) so I can get support for tesseract 4 (the version available on Ubuntu 18.04). The git log suggests I need the HEAD of origin/master for that. Could this be simply a matter of tesseract 4 not being fully supported yet? I have also tried with the latest tesseract version from https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr and have the same error. Is it worth trying to get tesseract 3 installed and using 0.8.7? Thanks.

AntonOfTheWoods avatar Nov 30 '18 14:11 AntonOfTheWoods

Give tesseract 3 a try indeed... in any case it's going to be faster, tesseract 4 seems better handling handwritten stuff but for our use doesn't seem like a great upgrade. On Fri, Nov 30, 2018 at 6:30 AM Anton Melser [email protected] wrote:

@cfsmp3 , I am using master rather than an official release version (like 0.8.7) so I can get support for tesseract 4 (the version available on Ubuntu 18.04). The git log suggests I need the HEAD of origin/master for that. Could this be simply a matter of tesseract 4 not being fully supported yet? Is it worth trying to get tesseract 3 installed and using 0.8.7? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

cfsmp3 avatar Nov 30 '18 17:11 cfsmp3

@Aradmey and @cfsmp3 , I can confirm that manually compiling tesseract 3.05 on Ubuntu 18.04 and compiling ccextractor at master and pointing to the tesseract 3 gets rid of the error. I definitely think the error message could do with some improvement though!

AntonOfTheWoods avatar Dec 01 '18 14:12 AntonOfTheWoods

I'm completely new to CCExtractor. I'm encountering the same issue when running 0.87 on Windows. My steps to reproduce:

  1. Install the Windows installer for CCExtractor on Windows 10.

  2. Run the GUI version with the following options:

C:\Program Files (x86)\CCExtractor\ccextractorwinfull.exe --gui_mode_reports -out=srt -bom -latin1 -hardsubx -subcolor white -conf_thresh 60 [+input files]

When I click the "Start" button, I get the message "Not enough memory to initialize Tesseract."

Could this be a problem with Windows support for Tesseract? I noticed that the Windows version seems to be lagging behind the Linux version.

For what it's worth, I'm trying to use CCExtractor to make some HBO shows more accessible. The show "My Brilliant Friend" is spoken in Italian and has burned-in subtitles in English, but those aren't accessible for for English-speaking blind users. Details below. If there's a way to OCR these subtitles, that would be completely amazing.

https://www.huffingtonpost.com/entry/hbo-discriminates-against-blind_us_5be073e1e4b04367a87f1cab

RobJacobson avatar Dec 11 '18 06:12 RobJacobson

Same issue as above, is there any solution yet? Run with different options? Many thanks.

anonynamja avatar Jan 15 '19 14:01 anonynamja

Same issue here

Pi7on avatar Jan 22 '19 14:01 Pi7on

Same issue. Windows 10, Tesseract3 is installed and in my System PATH.

bioluminesceme avatar Mar 08 '19 16:03 bioluminesceme

Is this already fixed ?

DaniGTA avatar Apr 11 '19 21:04 DaniGTA

@DaniGTA I guess that this problem was already fixed by #1083 that changed the way Tesseract is initialized. Previously if for some reason Tesseract was not initialized, you were getting a memory error. #1083 updated the way Tesseract is initialized to be more stable. Anybody who had this error - kindly ask you to check it again with CCExtractor's master.

thelastpolaris avatar Apr 12 '19 14:04 thelastpolaris

Hello,

I was having the same problem (error message while running 0.87 GUI - "Not enough memory to initialize Tesseract") so I cloned the master and compiled on Windows 10 using Visual Studio 2019 (Community) and the instructions given here. However, when I launch the new GUI I am seeing the following message:

ccext

I have tried compiling with both the Debug and Release configurations. Has anyone else had this problem or have an idea why the library can't be found?

drodz11 avatar May 09 '19 20:05 drodz11

Closing - this seems fixed. Feel free to comment if anyone is having this problem in current master.

cfsmp3 avatar Nov 21 '21 18:11 cfsmp3