tesseract.js icon indicating copy to clipboard operation
tesseract.js copied to clipboard

Worker loading language traineddata progress 0

Open IAndreaGiuseppe opened this issue 5 years ago • 9 comments

Describe the bug Using a basic example code I'm unable to get an extracted text from an image.

Object { status: "loading tesseract core", progress: 0 }
Object { status: "loading tesseract core", progress: 1 }
Object { workerId: "Worker-0-ac418", status: "initializing tesseract", progress: 0 }
Object { workerId: "Worker-0-ac418", status: "initialized tesseract", progress: 1 }
Object { workerId: "Worker-0-ac418", status: "loading language traineddata", progress: 0 }

after this point nothing happen.

To Reproduce

<template>
    <div>
        <button v-on:click="recognize">recognize</button>
    </div>
</template>

<script>
import { createWorker } from "tesseract.js";

const worker = createWorker({
    logger: m => console.log(m)
});

export default {
    name: "ocr-reader",

    methods: {
        "recognize": function() {
            await worker.load();
            await worker.loadLanguage("eng");
            await worker.initialize("eng");
            await worker.initialize("eng");
            const {
                data: { text }
            } = await worker.recognize("http://localhost:8000/WEZK8.png");
            console.log(text);
            await worker.terminate();
        }
    }
};
</script>

simplest Vue component

Expected behavior I expect to see a text message on console

Additional context I'm doing a test on my localhost. I checked everything is correctly loaded. even traineddata file is correctly downloaded with status 200

IAndreaGiuseppe avatar Feb 24 '20 15:02 IAndreaGiuseppe

Sooo, the problem is in Firefox, with chrome I can get the text extracted

IAndreaGiuseppe avatar Feb 24 '20 15:02 IAndreaGiuseppe

I'm using chrome and experiencing the same problem.

barryZZJ avatar Feb 28 '20 13:02 barryZZJ

I have tried both Chrome and Firefox and it works perfectly. @IAndreaGiuseppe may I know the version of Firefox? @barryZZJ You might face some network issue, you can try this offline version to verify: https://github.com/jeromewu/tesseract.js-offline

jeromewu avatar Mar 09 '20 13:03 jeromewu

@jeromewu I'm actually on FF 73.0.1 (64bit) on Windows

IAndreaGiuseppe avatar Mar 09 '20 14:03 IAndreaGiuseppe

Hi @IAndreaGiuseppe, I have tried Firefox 74.0 (64bit) on Windows and it still works. Maybe you try Private Window in Firefox to avoid potential cache issue, and wait for a little more if your network is not fast.

jeromewu avatar Mar 12 '20 09:03 jeromewu

Hi @jeromewu and thank you, I'm almost again on the subject and will be able to test this process again soon. Please don't close this issue.

IAndreaGiuseppe avatar Mar 23 '20 18:03 IAndreaGiuseppe

I experience the same problem in firefox. If langPath is set to a remote path such as 'https://tessdata.projectnaptha.com/4.0.0_fast', it works fine. However, if langPath is set to a relative path inside the extension, it fails to load lang data. No issue in chrome.

chungym avatar Dec 22 '21 07:12 chungym

I'm experiencing the same issue, but with any langPath. I tried the default one, https://tessdata.projectnaptha.com/4.0.0_fast, a relative path. All variants are stuck on loading language traineddata:

 { workerId: "Worker-0-2c495", status: "loading language traineddata", progress: 0, userJobId: "Job-1-6281d" }

Ubuntu, Firefox 96.0.2 (64-bit)

TwoAbove avatar Jan 27 '22 04:01 TwoAbove

The fix for me was to add

      cacheMethod: 'none'

TwoAbove avatar Jan 27 '22 06:01 TwoAbove

I think the only thing immediately actionable here is that the promise returned by worker.loadLanguage is neither resolved or rejected (no error message--it just gets "stuck"). Once that is resolved users can catch the error and try again with a different cacheMethod, and if there is an underlying bug with Tesseract.js we will have an error message to work off of.

It looks like this is the offending part--when a DOMException is encountered the promise is never rejected. I will edit such that all errors lead to a rejected promise.
https://github.com/naptha/tesseract.js/blob/bce7cd84fe823ca970854fe9c2c76d0c75051447/src/worker-script/index.js#L147-L153

Balearica avatar Sep 20 '22 05:09 Balearica

I updated the master branch so now there should be no errors that do not lead to a message/promise rejection. This will be reflected in the next npm release (3.0.4). If anybody using version >=3.04 encounters either a non-resolving promise from worker.loadLanguage or an error due to an underlying bug they should open a new issue.

Balearica avatar Sep 21 '22 01:09 Balearica