transformers.js icon indicating copy to clipboard operation
transformers.js copied to clipboard

NodeJS Local Files Only - Headers Not Defined & Incorrect Path Splitters

Open axrati opened this issue 2 years ago • 17 comments

System Info

Windows 10 - 10.0.19045 Build 19045 Alienware m17 R3 CPU - Intel i7-10750H

Node version: v16.14.2

main.mjs:

import { pipeline } from '@xenova/transformers';

let pipe = await pipeline('feature-extraction','gte-small',{local_files_only:true});
let out = await pipe('Hey model! Respond to me!');

Package.json:

{
  "name": "js-hf",
  "version": "1.0.0",
  "description": "",
  "main": "main.mjs",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "@xenova/transformers": "^2.14.0"
  }
}

Clone this repository directly into root of project: https://huggingface.co/Supabase/gte-small

Your final project outlook will look like this:

--${YOUR_PROJ_NAME}
----- gte-small
----- node_modules
----- main.mjs
----- package-lock.json
----- package.json

Environment/Platform

  • [ ] Website/web-app
  • [ ] Browser extension
  • [X] Server-side (e.g., Node.js, Deno, Bun)
  • [ ] Desktop app (e.g., Electron)
  • [ ] Other (e.g., VSCode extension)

Description

When trying to import models locally, it looks like there are still HTTP requests trying to be fired off. Expected behavior is that when local_files_only is true, that it would only try to use local files.

Secondarily, it looks like the paths to load assets is incorrect on a Windows computer. It is using / instead of \ for transformer assets. It also doesnt seem to be respecting relative path vs absolute path... perhaps that needs to be changed as well?

Error output:

Axrati@DESKTOP-H8KG7FT MINGW64 ~/Desktop/Code/js-hf
$ node main.mjs
Unable to load from local path "C:\Users\Axrati\Desktop\Code\js-hf\node_modules\@xenova\transformers\models\/gte-small/tokenizer.json": "ReferenceError: Headers is not defined"
Unable to load from local path "C:\Users\Axrati\Desktop\Code\js-hf\node_modules\@xenova\transformers\models\/gte-small/tokenizer_config.json": "ReferenceError: Headers is not defined"
Unable to load from local path "C:\Users\Axrati\Desktop\Code\js-hf\node_modules\@xenova\transformers\models\/gte-small/config.json": "ReferenceError: Headers is not defined"
file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/utils/hub.js:462
                    throw Error(`\`local_files_only=true\` or \`env.allowRemoteModels=false\` and file was not found locally at "${localPath}".`);
                          ^

Error: `local_files_only=true` or `env.allowRemoteModels=false` and file was not found locally at "C:\Users\Axrati\Desktop\Code\js-hf\node_modules\@xenova\transformers\models\/gte-small/tokenizer.j
    at getModelFile (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/utils/hub.js:462:27)
    at async getModelJSON (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/utils/hub.js:575:18)
    at async Promise.all (index 0)
    at async loadTokenizer (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/tokenizers.js:61:18)
    at async Function.from_pretrained (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/tokenizers.js:4296:50)
    at async Promise.all (index 0)
    at async loadItems (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/pipelines.js:3115:5)
    at async pipeline (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/pipelines.js:3055:21)
    at async file:///C:/Users/Axrati/Desktop/Code/js-hf/main.mjs:5:12

As you can see, the "Unable to read local path" is trying to reference node_modules\@xenova\transformers\models\/gte-small/tokenizer.json, which wouldn't be valid Windows path... That said, it looks to not be respecting the relative path (if you see the System Requirements section, you can see the model is a directory in the root of the project, and this is searching through your library in node_modules)

If you look at your code in https://github.com/xenova/transformers.js/blob/main/src/utils/hub.js, you can see on lines 55-56 that the constructor for a FileResponse is instantiating Headers. This leads me to believe that even if the getFile function had its first 2 criteria met (env.useFS && !isValidHttpUrl(urlOrPath))), that its still executing unnecessary code for the protocol its trying to use.

I am happy to help create a PR for this! Please reach out and let me know. Would be helpful to catch up with someone on the team for repo direction/etc.

Reproduction

Based on steps in Sys Reqs / Description

npm install node main.mjs

axrati avatar Jan 13 '24 17:01 axrati

Node version: v16.14.2

Hi there 👋 Transformers.js requires Node.js v18+ to function correctly. Since Node 16 has reached EOL, we will not be adding support for it in future. See here for more information.

xenova avatar Jan 13 '24 17:01 xenova

Hello! - Thanks for the quick response! :)

I upgrade to Node v20.11.0 and reinstalled node_modules, still seeing path error. When adding absolute path, it still seems to append it to search through your library directly. Happy to help on this if you'd like!

$ node main.mjs
file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/utils/hub.js:462
                    throw Error(`\`local_files_only=true\` or \`env.allowRemoteModels=false\` and file was not found locally at "${localPath}".`);
                          ^

Error: `local_files_only=true` or `env.allowRemoteModels=false` and file was not found locally at "C:\Users\Axrati\Desktop\Code\js-hf\node_modules\@xenova\transformers\models\/gte-small/tokenizer.j
son".
    at getModelFile (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/utils/hub.js:462:27)
    at async getModelJSON (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/utils/hub.js:575:18)
    at async Promise.all (index 0)
    at async loadTokenizer (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/tokenizers.js:61:18)
    at async AutoTokenizer.from_pretrained (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/tokenizers.js:4296:50)
    at async Promise.all (index 0)
    at async loadItems (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/pipelines.js:3115:5)
    at async pipeline (file:///C:/Users/Axrati/Desktop/Code/js-hf/node_modules/@xenova/transformers/src/pipelines.js:3055:21)
    at async file:///C:/Users/Axrati/Desktop/Code/js-hf/main.mjs:4:12

Node.js v20.11.0

axrati avatar Jan 13 '24 18:01 axrati

I was able to fix this with the following code:

import { pipeline, env } from '@xenova/transformers';
env.localModelPath = './';
let pipe = await pipeline('feature-extraction','gte-small',{local_files_only:true});

I think changing the semantics of this may benefit more diverse projects, I am building an Electron app for users to point to and use models wherever they may be on their computer.

If I start setting env.localModelPath, any time I try to reference an arbitrary model on their computer it needs to be relative to the Electron apps path (or whatever I set there). Would much rather have the ability to provide both relative and absolute paths. I would suggest perhaps a default variable for that (relative vs absolute) in the config alongside local_files_only?

Happy to make the changes myself, please let me know your thoughts!

axrati avatar Jan 13 '24 19:01 axrati

@xenova - bump! Let me know and I will start a PR for this

axrati avatar Jan 20 '24 00:01 axrati

I have the same issue too, I dont know why my model is there but it cannot file that path: env.localModelPath = new URL('../../../../../', import.meta.url).pathname;

this is not working too

env.localModelPath = '../../../../../';

Error: `local_files_only=true` or `env.allowRemoteModels=false` and file was not found locally at "/C:/Users/hiepx/small-cosmos/colbert-ir/colbertv2.0/tokenizer.json".
    at getModelFile (file:///C:/Users/hiepx/small-cosmos/losa/node_modules/.pnpm/@[email protected]/node_modules/@xenova/transformers/src/utils/hub.js:462:27)

but it actually there @axrati I think you should create an PR for this

hiepxanh avatar Jan 24 '24 09:01 hiepxanh

@hiepxanh Can you provide sample code and your directory structure so I can use as test case as well as mine?

axrati avatar Jan 24 '24 19:01 axrati

@xenova @hiepxanh - I have found a quick patch. unable to push a branch up though?

$ git push origin explicit-path
remote: Permission to xenova/transformers.js.git denied to axrati.
fatal: unable to access 'https://github.com/xenova/transformers.js.git/': The requested URL returned error: 403

axrati avatar Jan 25 '24 00:01 axrati

@xenova - the change here isn't major, and doesnt supply full vs relative path. Its an issue with how localPath & requestURL are derived. Please let me open branch to submit a PR!

axrati avatar Jan 26 '24 03:01 axrati

@xenova Bump!

axrati avatar Jan 28 '24 17:01 axrati

@xenova @hiepxanh - I have found a quick patch. unable to push a branch up though?

$ git push origin explicit-path
remote: Permission to xenova/transformers.js.git denied to axrati.
fatal: unable to access 'https://github.com/xenova/transformers.js.git/': The requested URL returned error: 403

Hi there 👋 Feel free to fork the repository, then submit a pull request. In that way, I can review your changes.

xenova avatar Jan 29 '24 09:01 xenova

the same issue to me!

TrumanDu avatar Feb 20 '24 03:02 TrumanDu

@xenova , @hiepxanh, @lsb , @TrumanDu

Sorry for the delay on this, have been working on other projects. Opened a forked PR here for review!: https://github.com/xenova/transformers.js/pull/602

axrati avatar Feb 22 '24 23:02 axrati

@axrati thanks for you contribute,realy need you PR.

TrumanDu avatar Mar 06 '24 05:03 TrumanDu

@TrumanDu @hiepxanh @lsb @xenova

No problem TrumanDu :) ... xenova - can you please check the PR? Small but effective change! https://github.com/xenova/transformers.js/pull/602

axrati avatar Mar 13 '24 18:03 axrati

@xenova - reminder for this PR! if you can approve the checks to run it'd help ~

axrati avatar Mar 26 '24 15:03 axrati

same issue, any update now?

wujohns avatar Apr 17 '24 15:04 wujohns

@xenova @wujohns

I have a PR open to address this! Need it to be checked:

https://github.com/xenova/transformers.js/pull/602

axrati avatar Apr 17 '24 17:04 axrati