jetson-generative-ai-playground Add Max Context Len value to get around with context len 1 error

Add Max Context Len value to get around with context len 1 error

Open tokk-nv opened this issue 8 months ago • 0 comments

Originally, --max-model-len=1 was set and it was causing the "context length 1" error.

"This model's maximum context length is 1 tokens. However, you requested 28 tokens in the messages, Please reduce the length of the messages."

This work around will generate the command like below, instead of --max-model-len=1.

Verified with JAO 64GB & JAO 32GB (gemma-3-4b-it), ~~Orin NX 16GB (gemma-3-1b-it)~~ with vlm.py.

docker run -it --rm \
  --name llm_server \
  --gpus all \
  -p 9000:9000 \
  -e DOCKER_PULL=always --pull always \
  -e HF_TOKEN=${HUGGINGFACE_TOKEN} \
  -e HF_HUB_CACHE=/root/.cache/huggingface \
  -v /mnt/nvme/cache:/root/.cache \
  dustynv/vllm:0.7.4-r36.4.0-cu128-24.04 \
  vllm serve google/gemma-3-4b-it \
  --host=0.0.0.0 --port=9000 --dtype=auto --max-num-seqs=1 --max-model-len=8192 --gpu-memory-utilization=0.75

Mar 15 '25 05:03 tokk-nv

jetson-generative-ai-playground jetson-generative-ai-playground copied to clipboard

Add Max Context Len value to get around with context len 1 error

jetson-generative-ai-playground
jetson-generative-ai-playground copied to clipboard