nexa-sdk nexa serve always using cpu instead of cpu

Nexa serve is always doing the infer by cpu.

I have tested with the deepseek ocr model.

with infer in the cli, everything is fine, when calling it with nexa serve --host 0.0.0.0:8000 is always using cpu

is there a fix for this?

NexaSDK Bridge Version: v1.0.31 NexaSDK CLI Version: v0.2.60

Nov 22 '25 11:11 parotech123

Hi, thanks for your feedback, Do you set the ngl as 0 in your request? Please try with a none-zero value, for example:

curl -X 'POST' \
  'http://localhost:8000/v1/chat/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "Qwen/Qwen3-1.7B-GGUF",
  "messages": [
    {
      "role": "user",
      "content": "What is the weather like in Boston today?"
    }
  ],
  "nctx": 4096,
  "max_completion_tokens": 2048,
  "ngl": 999,
  "image_max_length": 512
}'

Nov 24 '25 13:11 mengshengwu

Hello, sorry for the late reply.

this is my current payload

const payload = { model: 'NexaAI/DeepSeek-OCR-GGUF',

  messages: [
    {
      role: 'user',
      content: [
        {
          type: 'text',

          text: 'free ocr'
        },
        {
          type: 'image_url',
          image_url: {
            url: `data:image/png;base64,${base64Image}`
          }
        }
      ]
    }
  ],
  ngl:999,

  "temperature": 0.7,
  stream: false,
  "nctx": 4096,
};

nexa serve is running on windows. the ngl parameter looks fixed the issue.

New issue is that performing ocr on the same document, i get vastly different results nexa cli vs nexa serve. same document i get hallucination with nexa serve, while nexa cli is always creating good ocr results on same documents

I cant understand why i get this differences

I am using a RX 7900XTX for running deepseek ocr.

Nov 26 '25 22:11 parotech123

Ok just make it work for me plz

Keith cox

On Wed, Nov 26, 2025 at 5:03 PM Parotek @.***> wrote:

parotech123 left a comment (NexaAI/nexa-sdk#885) https://github.com/NexaAI/nexa-sdk/issues/885#issuecomment-3583345759

Hello, sorry for the late reply.

this is my current payload

const payload = { model: 'NexaAI/DeepSeek-OCR-GGUF',

messages: [ { role: 'user', content: [ { type: 'text',
      text: 'free ocr'
    },
    {
      type: 'image_url',
      image_url: {
        url: `data:image/png;base64,${base64Image}`
      }
    }
  ]
}
], ngl:999,

"temperature": 0.7, stream: false, "nctx": 4096, };

nexa serve is running on windows. the ngl parameter looks fixed the issue.

New issue is that performing ocr on the same document, i get vastly different results nexa cli vs nexa serve. same document i get hallucination with nexa serve, while nexa cli is always creating good ocr results on same documents

I cant understand why i get this differences

I am using a RX 7900XTX for running deepseek ocr.

— Reply to this email directly, view it on GitHub https://github.com/NexaAI/nexa-sdk/issues/885#issuecomment-3583345759, or unsubscribe https://github.com/notifications/unsubscribe-auth/BN5OQI7XYKWFYAZTWSXBFK336YPR5AVCNFSM6AAAAACM4MR5OKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKOBTGM2DKNZVHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Nov 27 '25 01:11 keithem