nexa-sdk icon indicating copy to clipboard operation
nexa-sdk copied to clipboard

python api - how to pass raw image data to the C code (not path, not server)

Open sujitvasanth opened this issue 1 month ago • 2 comments

Firstly well done! Really great to be able to run inference so easily! But need some more features for serious users

here my snippet functional code on windows to inference QWEN3VL

from nexaai.vlm import VLM, GenerationConfig
from nexaai.common import ModelConfig, MultiModalMessage, MultiModalMessageContent

# Initialize model
model_path = "NexaAI/Qwen3-VL-8B-Instruct-GGUF/Qwen3-VL-8B-Instruct.Q4_0.gguf"
mmproj_path = "NexaAI/Qwen3-VL-8B-Instruct-GGUF/mmproj.F32.gguf"
m_cfg = ModelConfig()
vlm = VLM.from_(name_or_path=model_path, m_cfg=m_cfg, plugin_id="nexaml", mmproj_path= mmproj_path)

# Create multimodal conversation
conversation = [MultiModalMessage(role="system", 
                                content=[MultiModalMessageContent(type="text", text="You are a helpful assistant.")])]

# Add user message with image
contents = [
    MultiModalMessageContent(type="text", text="Describe this image in detail"),
    MultiModalMessageContent(type="image", text=r"C:\Users\44741\Pictures\Screenshots\bus.jpg")
]
conversation.append(MultiModalMessage(role="user", content=contents))

# Apply chat template and generate
prompt = vlm.apply_chat_template(conversation)
for token in vlm.generate_stream(prompt, g_cfg=GenerationConfig(max_tokens=1000, image_paths=[r"C:\Users\44741\Pictures\Screenshots\bus.jpg"])):
    print(token, end="", flush=True)

Its great that I can inference Qwen3VL quantised on windows mucj faster than on hugging face transformers bur:

  1. How do you specify a predownloaded model directory e.g. C:\Users\44741\Desktop\Qwen3-VL-8B-Instruct-GGUF\Qwen3-VL-8B-Instruct.Q4_0.gguf
  2. how do you pass image data (raw bytes or PIL) directly to the inference module rather than a file... there is significant time loss in saving a file to hard disk and having load it back at the models end (it should at leas be in RAM or VRAM) e.g. computer use. Image saved by MSS --> send to direct to C program for inference

sujitvasanth avatar Oct 26 '25 16:10 sujitvasanth

re. 1. How do you specify a predownloaded model directory was able to solve by specifying full path model_path = "C:/Users/44741/Desktop/Qwen3-VL-8B-Instruct-GGUF/Qwen3-VL-8B-Instruct.Q4_0.gguf" mmproj_path = "C:/Users/44741/Desktop/Qwen3-VL-8B-Instruct-GGUF/mmproj.F32.gguf"

now how about image raw data?

sujitvasanth avatar Oct 26 '25 16:10 sujitvasanth

Good question — currently, passing in-memory image data (e.g., raw bytes or PIL.Image) directly to the inference module isn’t supported yet. The SDK expects image inputs as file paths for now.

We do plan to support direct in-memory (RAM/VRAM) data transfer in a future update to reduce I/O overhead

Thanks for the feedback — this is definitely on our radar.

mengshengwu avatar Oct 27 '25 08:10 mengshengwu