echogarden icon indicating copy to clipboard operation
echogarden copied to clipboard

Installation fails on Linux, when CUDA toolkit is available, due to ONNXRuntime NuGet error on `onnxruntime-node` package post-install script

Open serbulovv opened this issue 7 months ago • 7 comments

Hello! Im trying to install echogarden on Debian 12. I had previously installed the package (2.7.0 ) successfully, but now im getting installation errors ( i was trying 2.7.0, 2.6.0, 2.5.2 versions).

Specifically, when installing [email protected], i get the following error:

Error: Failed to find runtimes/win-x64/native/libonnxruntime_providers_cuda.so in NuGet package

npm ERR! code 1
npm ERR! path /usr/local/lib/node_modules/echogarden/node_modules/onnxruntime-node
npm ERR! command failed
npm ERR! command sh -c node ./script/install
npm ERR! Downloading https://api.nuget.org/v3-flatcontainer/microsoft.ml.onnxruntime.gpu.linux/1.22.0/microsoft.ml.onnxruntime.gpu.linux.1.22.0.nupkg
npm ERR! /usr/local/lib/node_modules/echogarden/node_modules/onnxruntime-node/script/install-utils.js:176
npm ERR!               throw new Error(`Failed to find ${pathInPackage} in NuGet package`);
npm ERR!                     ^
...
Error: Failed to find runtimes/win-x64/native/libonnxruntime_providers_cuda.so in NuGet package

OS: Debian 12 (Bookworm) Node.js: v18.19.0 NPM: 10.x echogarden versions tried: 2.7.0, 2.6.0, 2.5.2 (all fail the same way)

serbulovv avatar May 14 '25 08:05 serbulovv

Seems to be some issue, during install, with onnxruntime-node trying to fetch some of its CUDA shared libraries for its support for the cuda provider on Linux. It does this automatically during installation when it detects the CUDA toolkit and possibly cuDNN globally on your system. It's not related to Echogarden.

NuGet is a package manager for .NET and is not related to anything in Echogarden, which only uses npm.

The thing is, Echogarden release v2.7.0 depends on onnxruntime-node v1.21.1 but your error message shows it is fetching some sort of NuGet package versioned 1.22.0, which will only be supported in the next version of Echogarden (which I'm working on now).

I don't know the details of what onnxruntime-node does during installation. Seems to be an issue with the package.

Maybe it's fixed in 1.22.0. I don't know. We'll see when I publish the next version.

Anyway, you can try removing CUDA tookit / cuDNN from system path when installing (not sure how it detects it - maybe you need to remove or hide it completely). Maybe that will bypass the issue.

rotemdan avatar May 14 '25 09:05 rotemdan

I found this issue in the microsoft/onnxruntime repository. Could be related.

Another possible related issue is this.

rotemdan avatar May 14 '25 10:05 rotemdan

I've just published v2.8.0, which now uses onnxruntime-node v1.22.0. See if that changes anything.

rotemdan avatar May 15 '25 10:05 rotemdan

After testing more, I'm getting the same error when trying to install Echogarden (also v2.8.0) in WSL2, when CUDA is available.

It seems like onnxruntime-node has some sort of issue with its post-install script, that is impacting even older versions.

This is preventing Echogarden from being installed, so it's a major issue.

Oddly, I couldn't find a recent issue about this particular post-install error in the onnxruntime issue tracker. I'll keep searching, and also looking at the source code of the onnxruntime-node package to try to find out why this is happening.

Edit: isolating the issue

You can isolate the issue by trying to npm install only the onnxrutime-node package locally:

mkdir temp
cd temp
npm install onnxruntime-node

When installing like this, I get the same issue, so I can confirm it's not related to the Echogarden context. It happens independently.

rotemdan avatar May 15 '25 11:05 rotemdan

On April 16 2025, there was a pull request merged to allow installing DLLs from Nuget feed.

Here is its description:


Description

This PR makes changes to the installation script of ONNX Runtime Node.js binding.

Background

Because of the max size limit of NPM registry, the Node.js binding NPM package does not include some of the binaries, eg. the CUDA EP binaries for Linux/x64.

To make it working smoothly for CUDA EP users on Linux/x64, we need a script to download the binaries from somewhere in the process of NPM installation.

Problems

Before this PR, the script downloads the binaries from GitHub Release. This is working well but have 2 problems:

  • There is a gap between the release of the binaries and the release of the NPM package. The GitHub release is always the final step of the release process. Usually there are a few hours to a few days delay between the release of the NPM package and the release of the binaries on GitHub release.
  • GitHub release does not work with dev/nightly.

Solution

We find that using Nuget feed perfectly resolves the above problems:

  • anonymous download is allowed
  • Nuget publish can be adjusted to be prior to NPM publish in the release process
  • ONNX Runtime has a nightly Nuget feed

The PR changes to use Nuget package for downloading the binaries.


It's likely the issue is related to this, since I don't remember having anything similar, or anything about NuGet packages involved.

It's really odd no one, within a month, has reported any issue! I'll open an issue if needed.

Edit: Temporary Workaround

Turns out there is an environment variable that you can set to prevent the CUDA Execution Provider binaries to be downloaded during npm install:

ONNXRUNTIME_NODE_INSTALL=skip npm install echogarden -g

The ONNXRUNTIME_NODE_INSTALL naming is confusing. It doesn't prevent the package from being installed, just the CUDA EP binaries.

The source code for the post-install script it uses is here (it includes some further information).

Update

I opened an issue on the onnxruntime issue tracker.

rotemdan avatar May 15 '25 11:05 rotemdan

The cause of the issue sees to be incorrect file paths in the onnxruntime-node post-install manifest, pointing to win-x64 instead of Linux directories.

After testing further, it turns out the issue doesn't actually occur on v1.21.1 or earlier, only v1.22.0.

The transition to NuGet, which was committed on April 16 2025, was made on v1.22.0, which was published on May 10 2025. I verified in the source code of v1.21.1 that it didn't use NuGet on the post-install script.

The original report by @serbulovv must had some sort of incorrect package versioning state in the npm that possibly overridden the version specified in Echogarden (v1.21.1) with the newer version.

So then actually, when I released v2.8.0 it actually introduced the issue! Didn't fix it, since v2.7.0 didn't have it. I verified this.

So now, in v2.8.4, I rolled back to onnxruntime-node v1.21.1, and I'll only upgrade to v1.22.0 once the issue is fixed.

rotemdan avatar May 17 '25 07:05 rotemdan

Actually, the error was mine and it was a very significant one! I had a misconception about a package.json version like ^1.21.1 applying only up to bug-fix (patch) versions like 1.21.2 or 1.21.3, but actually, it applied to all versions up to version 2.0.0, including 1.22.0, 1.23.0, etc. That's why this was happening. The update I made in v2.8.4 had no effect.

What I should have been using instead was ~!, so now in v2.8.5 I finally fixed the entire package.json file to use ~ instead of ^.

It's unbelievable I've been working on this library for more than 2 years and no one has ever suggested I should make this change! Actually, I've been working with Node.js about 12 - 13 years now and it completely went out of my mind the significant difference between ^ and ~ (it's possible I knew about it at some point, and later forgot maybe)

The reason is likely that tools like npm and node-check-updates kept "pushing" to use ^ by default! Which is basically bad for producing stable, reproducible software! I simply assumed ^ meant "bug-fixes only" (patches only) because it made no sense that npm would, by default use it, since npm and its author are strong proponents of semantic versioning! This is so odd for me! How can this be?

Anyway, here is an npm version cheatsheet that clearly states that ^ is both minor versions and patch releases, and ~ is patches only. It's likely that many other people, other than myself aren't fully aware of that, since maybe someone could have spotted the issue sooner.

Edit:

The problem actually goes way beyond myself - it's the entire Node.js ecosystem: if you go and do a survey of package.json files. In particular, say, the libraries I use internally in Echogarden, almost all of them use ^ and not ~ for dependency versioning, simply because npm forces it every single time you run npm install and it becomes nearly impossible to manage it unless you're strongly aware of the problem!

Almost no one is aware of this! So even after I went and modified every ^ to ~ in my own package.json, I can't change the package.json files of the dependencies themselves, so if they use ^, old versions of echogarden may still not be fully reproducible (since dependencies of dependencies may cause newer, incompatible packages to be used) unless the user uses some other, stronger mechanism for reproducible dependency management like a package-lock.json file (I do publish that on the GitHub repository) or npm ci, but that may not always be available.

rotemdan avatar May 17 '25 08:05 rotemdan