tesseract.js
tesseract.js copied to clipboard
Worker loading language traineddata progress 0
Describe the bug Using a basic example code I'm unable to get an extracted text from an image.
Object { status: "loading tesseract core", progress: 0 }
Object { status: "loading tesseract core", progress: 1 }
Object { workerId: "Worker-0-ac418", status: "initializing tesseract", progress: 0 }
Object { workerId: "Worker-0-ac418", status: "initialized tesseract", progress: 1 }
Object { workerId: "Worker-0-ac418", status: "loading language traineddata", progress: 0 }
after this point nothing happen.
To Reproduce
<template>
<div>
<button v-on:click="recognize">recognize</button>
</div>
</template>
<script>
import { createWorker } from "tesseract.js";
const worker = createWorker({
logger: m => console.log(m)
});
export default {
name: "ocr-reader",
methods: {
"recognize": function() {
await worker.load();
await worker.loadLanguage("eng");
await worker.initialize("eng");
await worker.initialize("eng");
const {
data: { text }
} = await worker.recognize("http://localhost:8000/WEZK8.png");
console.log(text);
await worker.terminate();
}
}
};
</script>
simplest Vue component
Expected behavior I expect to see a text message on console
Additional context I'm doing a test on my localhost. I checked everything is correctly loaded. even traineddata file is correctly downloaded with status 200
Sooo, the problem is in Firefox, with chrome I can get the text extracted
I'm using chrome and experiencing the same problem.
I have tried both Chrome and Firefox and it works perfectly. @IAndreaGiuseppe may I know the version of Firefox? @barryZZJ You might face some network issue, you can try this offline version to verify: https://github.com/jeromewu/tesseract.js-offline
@jeromewu I'm actually on FF 73.0.1 (64bit) on Windows
Hi @IAndreaGiuseppe, I have tried Firefox 74.0 (64bit) on Windows and it still works. Maybe you try Private Window in Firefox to avoid potential cache issue, and wait for a little more if your network is not fast.
Hi @jeromewu and thank you, I'm almost again on the subject and will be able to test this process again soon. Please don't close this issue.
I experience the same problem in firefox. If langPath is set to a remote path such as 'https://tessdata.projectnaptha.com/4.0.0_fast', it works fine. However, if langPath is set to a relative path inside the extension, it fails to load lang data. No issue in chrome.
I'm experiencing the same issue, but with any langPath
. I tried the default one, https://tessdata.projectnaptha.com/4.0.0_fast
, a relative path. All variants are stuck on loading language traineddata
:
{ workerId: "Worker-0-2c495", status: "loading language traineddata", progress: 0, userJobId: "Job-1-6281d" }
Ubuntu, Firefox 96.0.2 (64-bit)
The fix for me was to add
cacheMethod: 'none'
I think the only thing immediately actionable here is that the promise returned by worker.loadLanguage
is neither resolved or rejected (no error message--it just gets "stuck"). Once that is resolved users can catch the error and try again with a different cacheMethod
, and if there is an underlying bug with Tesseract.js we will have an error message to work off of.
It looks like this is the offending part--when a DOMException
is encountered the promise is never rejected. I will edit such that all errors lead to a rejected promise.
https://github.com/naptha/tesseract.js/blob/bce7cd84fe823ca970854fe9c2c76d0c75051447/src/worker-script/index.js#L147-L153
I updated the master branch so now there should be no errors that do not lead to a message/promise rejection. This will be reflected in the next npm release (3.0.4
). If anybody using version >=3.04
encounters either a non-resolving promise from worker.loadLanguage
or an error due to an underlying bug they should open a new issue.