sdecoder
sdecoder
> Hi, @sdecoder Could you try to use --load_model_on_cpu? Thank you so much for providing such precious hint! I will give it a try immediately. Hopefully it will work. :D
> Hi, @sdecoder Could you try to use --load_model_on_cpu? Thank you so much! It works! Also I am really curious to know if we can use CPU to build the...
> @sdecoder Do you mean the weights are too big to be stored in on GPU (26GB > 24GB), so you need to offload some (or all ) weights to...
> That warning comes from onnxruntime and not TensorRT. Could you try asking on the onnxruntime repo? Hello there. I mean no offense but once the onnx file is convert...
I have tried to use the command found here: https://docs.sglang.ai/references/nvidia_jetson.html to build sglang: **CUDA_VERSION=12.6 jetson-containers build sglang** And I have got following error: Successfully installed annotated-types-0.7.0 pydantic-2.10.6 pydantic-core-2.27.2 tiktoken-0.9.0 xgrammar-0.1.14...
> it is working with cuda 12.8 https://pypi.jetson-ai-lab.dev/jp6/cu128 and solved Hello. I am using the sglang container as you mentioned: ref: https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/sglang ref: https://hub.docker.com/r/dustynv/sglang/tags docker pull dustynv/sglang:0.4.4-r36.4.0-cu128-24.04 docker ps CONTAINER...
> yesterday I upload the latest wheels to pypi, you can: pip3 install --force-reinstall sglang vllm flashinfer --index-url https://pypi.jetson-ai-lab.dev/jp6/cu126 Thank you so much for your great work! I have tried...