Nexa CLI v0.2.37 inside Docker (Windows Snapdragon X Elite) – "Oops. Model failed to load." with no further diagnostics
Environment
- Host: Windows 11 on Snapdragon X Elite (32 GB RAM)
- Docker Desktop for Windows (WSL2 backend)
- Container base: debian:bookworm-slim
- Nexa CLI: v0.2.37 (installed via exa-cli_linux_x86_64.sh)
- License: Injected via Docker secret and applied with exa config set license key/...
Container Setup
- Installs curl, ca-certificates, ash, xz-utils, odejs, sox, fmpeg, downloads Nexa CLI 0.2.37, and runs the installer
- Runs as user exa with home /var/lib/nexa
- HTTP sidecar ( ode /srv/server.js) executes exa infer
- Shared memory: shm_size: 4g
- License applied at startup (logs show [LICENSE] existing license detected)
- Models downloaded ( exa pull) and cached in /var/lib/nexa/.cache/nexa.ai
Backend Integration
- Backend (Node/Express) calls Nexa over HTTP (http://nexa:18181 when containerized)
- Also tested http://host.docker.internal:18181 pointing to native host exa serve
Steps to Reproduce
-
docker compose up -d nexa backend
-
Inside container:
ash docker compose exec nexa nexa config list # license present docker compose exec nexa nexa list # models cached -
Run inference:
ash docker compose exec nexa sh -lc "nexa infer 'NexaAI/phi4-mini-npu-turbo' -p 'hello' --ngl 0 --max-tokens 64 --think=false" -
Observe CLI output: ` ⚠️ Oops. Model failed to load.
👉 Try these:
- Verify your system meets the model's requirements.
- Seek help in our discord or slack. `
- Exit code = 0, stderr empty, no additional logs even with NEXA_LOG_LEVEL=debug
-
Sidecar HTTP call returns 500 with the same banner in output field and empty stderr
Diagnostics
- /diag endpoint:
json { "time": "2025-10-02T09:16:02.945Z", "version": "NexaSDK Bridge Version: v1.0.17\nNexaSDK CLI Version: v0.2.37", "config": "license: key/...", "models": "... OmniNeural-4B ... phi4-mini-npu-turbo ..." } - License cache: /var/lib/nexa/.cache/nexa.ai/nexa_sdk/config contains the key
- No crash logs or detailed errors recorded
Observed Behavior
- All attempts to load models inside the Debian container fail with the generic banner
- Same models run successfully when exa serve --host 127.0.0.1:18181 is executed natively on Windows (host) – backend connects to host service and inference succeeds (NPU used)
Mitigations Attempted
- CPU fallback flags (--ngl 0, --think=false, low --max-tokens, sampler adjustments)
- Increased shared memory to 4 GiB
- Tested OmniNeural-4B and phi4-mini-npu-turbo
- Verified license ( exa config list) and cache state ( exa list)
- Ran inference with debug logging (NEXA_LOG_LEVEL=debug, --verbose) – still only the banner
Request to Nexa
- Does the Nexa CLI require runtime libraries (QNN, GPU drivers, etc.) that aren’t available inside this Debian container on Windows/ARM?
- Are there environment variables or configuration steps to enable CPU fallback/QNN emulation in containerized environments?
- Is there a CLI flag to produce more detailed diagnostics beyond the “Oops” banner?
- If containerized Nexa on Windows isn’t supported, please confirm so we can rely solely on host-side exa serve.
We can supply the Dockerfile, docker-compose.yml, or additional logs on request.
Nexa CLI: v0.2.37 (installed via nexa-cli_linux_x86_64.sh)
That build is for amd64, rather than QNN. AFAIK there is currently no Linux installer for Docker on ARM. See also #458.
cc @zhiyuan8
Yo how am I not getting payed for this I made it
Keith cox
On Thu, Oct 2, 2025 at 9:09 PM iwr-redmond @.***> wrote:
iwr-redmond left a comment (NexaAI/nexa-sdk#628) https://github.com/NexaAI/nexa-sdk/issues/628#issuecomment-3363784045
Nexa CLI: v0.2.37 (installed via nexa-cli_linux_x86_64.sh)
That build is for amd64, rather than QNN. AFAIK there is currently no Linux installer for Docker on ARM. See also #458 https://github.com/NexaAI/nexa-sdk/issues/458.
cc @zhiyuan8 https://github.com/zhiyuan8
— Reply to this email directly, view it on GitHub https://github.com/NexaAI/nexa-sdk/issues/628#issuecomment-3363784045, or unsubscribe https://github.com/notifications/unsubscribe-auth/BN5OQI7CUN4KK2LZ35OHBZT3VXEFDAVCNFSM6AAAAACIDCDBUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGNRTG44DIMBUGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@iwr-redmond thank you forwarding the issue to us
@keithem
(1) Please use the arm64 version, not X86 version: https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_windows_arm64.exe
(2) Within Docker, would you also confirm if docker can access NPU? For GPU, when we start docker, we need to specify --gpus all
@IFSERPConsulting, the latest release now has a Linux ARM build (see #653).
@IFSERPConsulting If you are using IQ9, please try our linux SDK with docker support: https://docs.nexa.ai/nexa-sdk-docker/overview If you are using windows laptop, please directly use our CLI or Python binding, docker is not supported yet: https://docs.nexa.ai/nexa-sdk-python/overview