nexa-sdk Nexa CLI v0.2.37 inside Docker (Windows Snapdragon X Elite) – "Oops. Model failed to load." with no further diagnostics

Environment

Host: Windows 11 on Snapdragon X Elite (32 GB RAM)
Docker Desktop for Windows (WSL2 backend)
Container base: debian:bookworm-slim
Nexa CLI: v0.2.37 (installed via exa-cli_linux_x86_64.sh)
License: Injected via Docker secret and applied with exa config set license key/...

Container Setup

Installs curl, ca-certificates, ash, xz-utils, odejs, sox, fmpeg, downloads Nexa CLI 0.2.37, and runs the installer
Runs as user exa with home /var/lib/nexa
HTTP sidecar ( ode /srv/server.js) executes exa infer
Shared memory: shm_size: 4g
License applied at startup (logs show [LICENSE] existing license detected)
Models downloaded ( exa pull) and cached in /var/lib/nexa/.cache/nexa.ai

Backend Integration

Backend (Node/Express) calls Nexa over HTTP (http://nexa:18181 when containerized)
Also tested http://host.docker.internal:18181 pointing to native host exa serve

Steps to Reproduce

docker compose up -d nexa backend
Inside container: ash docker compose exec nexa nexa config list # license present docker compose exec nexa nexa list # models cached
Run inference: ash docker compose exec nexa sh -lc "nexa infer 'NexaAI/phi4-mini-npu-turbo' -p 'hello' --ngl 0 --max-tokens 64 --think=false"
Observe CLI output: ` ⚠️ Oops. Model failed to load.

👉 Try these:
- Verify your system meets the model's requirements.
- Seek help in our discord or slack. `
- Exit code = 0, stderr empty, no additional logs even with NEXA_LOG_LEVEL=debug
Sidecar HTTP call returns 500 with the same banner in output field and empty stderr

Diagnostics

/diag endpoint: json { "time": "2025-10-02T09:16:02.945Z", "version": "NexaSDK Bridge Version: v1.0.17\nNexaSDK CLI Version: v0.2.37", "config": "license: key/...", "models": "... OmniNeural-4B ... phi4-mini-npu-turbo ..." }
License cache: /var/lib/nexa/.cache/nexa.ai/nexa_sdk/config contains the key
No crash logs or detailed errors recorded

Observed Behavior

All attempts to load models inside the Debian container fail with the generic banner
Same models run successfully when exa serve --host 127.0.0.1:18181 is executed natively on Windows (host) – backend connects to host service and inference succeeds (NPU used)

Mitigations Attempted

CPU fallback flags (--ngl 0, --think=false, low --max-tokens, sampler adjustments)
Increased shared memory to 4 GiB
Tested OmniNeural-4B and phi4-mini-npu-turbo
Verified license ( exa config list) and cache state ( exa list)
Ran inference with debug logging (NEXA_LOG_LEVEL=debug, --verbose) – still only the banner

Request to Nexa

Does the Nexa CLI require runtime libraries (QNN, GPU drivers, etc.) that aren’t available inside this Debian container on Windows/ARM?
Are there environment variables or configuration steps to enable CPU fallback/QNN emulation in containerized environments?
Is there a CLI flag to produce more detailed diagnostics beyond the “Oops” banner?
If containerized Nexa on Windows isn’t supported, please confirm so we can rely solely on host-side exa serve.

We can supply the Dockerfile, docker-compose.yml, or additional logs on request.

Oct 02 '25 11:10 IFSERPConsulting

Nexa CLI: v0.2.37 (installed via nexa-cli_linux_x86_64.sh)

That build is for amd64, rather than QNN. AFAIK there is currently no Linux installer for Docker on ARM. See also #458.

cc @zhiyuan8

Oct 03 '25 01:10 iwr-redmond

Yo how am I not getting payed for this I made it

Keith cox

On Thu, Oct 2, 2025 at 9:09 PM iwr-redmond @.***> wrote:

iwr-redmond left a comment (NexaAI/nexa-sdk#628) https://github.com/NexaAI/nexa-sdk/issues/628#issuecomment-3363784045

Nexa CLI: v0.2.37 (installed via nexa-cli_linux_x86_64.sh)

That build is for amd64, rather than QNN. AFAIK there is currently no Linux installer for Docker on ARM. See also #458 https://github.com/NexaAI/nexa-sdk/issues/458.

cc @zhiyuan8 https://github.com/zhiyuan8

— Reply to this email directly, view it on GitHub https://github.com/NexaAI/nexa-sdk/issues/628#issuecomment-3363784045, or unsubscribe https://github.com/notifications/unsubscribe-auth/BN5OQI7CUN4KK2LZ35OHBZT3VXEFDAVCNFSM6AAAAACIDCDBUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGNRTG44DIMBUGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Oct 03 '25 03:10 keithem

@iwr-redmond thank you forwarding the issue to us @keithem (1) Please use the arm64 version, not X86 version: https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_windows_arm64.exe (2) Within Docker, would you also confirm if docker can access NPU? For GPU, when we start docker, we need to specify --gpus all

Oct 06 '25 01:10 zhiyuan8

@IFSERPConsulting, the latest release now has a Linux ARM build (see #653).

Oct 12 '25 00:10 iwr-redmond

@IFSERPConsulting If you are using IQ9, please try our linux SDK with docker support: https://docs.nexa.ai/nexa-sdk-docker/overview If you are using windows laptop, please directly use our CLI or Python binding, docker is not supported yet: https://docs.nexa.ai/nexa-sdk-python/overview

Dec 17 '25 01:12 zhiyuan8