tesseract.js icon indicating copy to clipboard operation
tesseract.js copied to clipboard

Standalone HTML with offline tessaract is not working

Open javadevelopment4 opened this issue 4 years ago • 7 comments

I would like to have a standalone HTML file which accepts image as well as language. Process the file without any web server.

I have created the HTML withe all libraries in a local folder. However it is failed to work.

Can you please kindly help me with the HTML code so that i can use it standalone?

why can't run without a server ? I want to have my offline conversion in my local laptop which should not expects a server..

javadevelopment4 avatar Jun 21 '20 05:06 javadevelopment4

from what i see it progressively downloads worker scripts

Osmiogrzesznik avatar Dec 20 '20 09:12 Osmiogrzesznik

try this https://github.com/naptha/tesseract.js/blob/master/docs/local-installation.md

Osmiogrzesznik avatar Dec 20 '20 09:12 Osmiogrzesznik

@Osmiogrzesznik Thank u, I have fixed this issue using webpack.

const MergeIntoSingleFilePlugin = require("webpack-merge-and-include-globally");
plugins: [
    new MergeIntoSingleFilePlugin({
      files: {
        "tesseract-worker.min.js": [
          "./node_modules/tesseract.js/dist/worker.min.js",
        ],
        "tesseract-core.wasm.js": [
          "./node_modules/tesseract.js-core/tesseract-core.wasm.js",
        ],
      },
    })
   ]

while creating worker some where

const worker = createWorker({
  workerPath: location.href + "/tesseract-worker.min.js",
  langPath: "./lang-data", // downloaded trained data files.
  corePath: location.href + "/tesseract-core.wasm.js"
});

yaswanthsvist avatar Sep 03 '21 17:09 yaswanthsvist

@yaswanthsvist can you please attach the full HTML file for reference and testing. Thank you

brightinnovator avatar Sep 04 '21 18:09 brightinnovator

@brightinnovator Unfortunately my tesseract processing code is not inside index.html.

copy the bellow files from node_modules to the directory where ur index.html file is located and refer these files from your code

cp ./node_modules/tesseract.js/dist/worker.min.js  ./path_where_index.html_is_present/
cp ./node_modules/tesseract.js-core/tesseract-core.wasm.js  ./path_where_index.html_is_present/

in ur index.html

<script>
 (async ()=>{
  const worker = createWorker({
    workerPath: "./worker.min.js",
    langPath: "./lang-data", // downloaded trained data files.
    corePath: "./tesseract-core.wasm.js"
  });
  await worker.load();
  await worker.loadLanguage("eng");
  await worker.initialize("eng");
  
  /* do what ever you want with tesseract worker here */
 
 })()
</script>

mostly this should work for you to up and running it in offline.

yaswanthsvist avatar Sep 05 '21 04:09 yaswanthsvist

can you please attach one sample HTML as well for the above for testing and implement in the website?

brightinnovator avatar Sep 12 '21 11:09 brightinnovator

Let me know if this issue is still active, and if so, what the use-case for an offline .html version would be. A standalone/offline .html file seems entirely possible from a technical perspective, however as the base Tesseract program can be compiled to run on almost any device (and run faster), it's unclear to me what the utility of an offline version of the wasm port would be.

Note: I do not believe the version marked "offline" was ever intended to be run without an http server running--rather I think the idea is that it can be run on a local server (i.e. without access to the internet).

Balearica avatar Aug 09 '22 04:08 Balearica

Closing as stale. Feel free to reopen if you believe there is a compelling use case for a fully offline .html version.

Balearica avatar Aug 29 '22 00:08 Balearica