Unable to modify inference hyperparameters (temperature, top_p, seed) in vila-infer

Open cravirajan opened this issue 5 months ago • 0 comments

📝 Issue Description

When attempting to modify inference hyperparameters like temperature, top_p, max_new_tokens, and other generation parameters in the vila-infer command, the system doesn't expose these critical parameters through the CLI interface. This severely limits control over model generation behavior and makes it difficult to integrate VILA into production systems that require fine-tuned inference control.

🔄 Steps to Reproduce

✅ Working Commands (Basic Inference)

Image inference - works fine:

vila-infer \
  --model-path Efficient-Large-Model/VILA1.5-3b \
  --conv-mode vicuna_v1 \
  --text "Please describe the image" \
  --media demo_images/demo_img.png

Video inference - works fine:

vila-infer \
  --model-path Efficient-Large-Model/VILA1.5-3b \
  --conv-mode vicuna_v1 \
  --text "Please describe the video" \
  --media https://huggingface.co/datasets/Efficient-Large-Model/VILA-inference-demos/resolve/main/OAI-sora-tokyo-walk.mp4

❌ Failed Commands (With Hyperparameters)

Attempting to add generation parameters - fails:

vila-infer \
  --model-path Efficient-Large-Model/VILA1.5-3b \
  --conv-mode vicuna_v1 \
  --text "Please describe the image" \
  --media demo_images/demo_img.png \
  --temperature 0.7 \
  --top_p 0.9 \
  --max_new_tokens 512
# Error: unrecognized arguments: --temperature --top_p --max_new_tokens

💡 Expected vs Actual Implementation

🎯 Expected Implementation (What Should Work)

Based on the internal script structure, vila-infer should support these parameters:

# Expected CLI interface
parser.add_argument("--temperature", type=float, default=0.7)
parser.add_argument("--top_p", type=float, default=0.9) 
parser.add_argument("--max_new_tokens", type=int, default=100)

# Expected generation_kwargs usage
generation_kwargs = {
    "temperature": args.temperature,
    "top_p": args.top_p,
    "max_new_tokens": args.max_new_tokens,
}

response = model.generate_content(prompt, response_format=response_format, **generation_kwargs)

🐛 Actual Implementation (Current Limitation)

The current vila-infer command does not expose the **generation_kwargs parameters that are clearly supported by the underlying model.generate_content() method.

🔧 Current Workarounds and Limitations

📱 FastAPI Server Implementation Issue

We've implemented a FastAPI server wrapper for VILA inference, but it cannot pass inference hyperparameters to the underlying vila-infer command:

@app.post("/v1/chat/completions") async def chat_completions(request: Request): request_data = await request.json()

# ✅ Can extract OpenAI-compatible parameters
temperature = request_data.get("temperature", 0.7)
top_p = request_data.get("top_p", 1.0)
max_tokens = request_data.get("max_tokens", 2048)

# ❌ Cannot pass them to vila-infer
cmd = [
    "vila-infer",
    "--model-path", MODEL_PATH,
    "--conv-mode", CONV_MODE,
    "--text", text
    # These parameters are not supported:
    # "--temperature", str(temperature),    # ❌ Not available
    # "--top_p", str(top_p),               # ❌ Not available  
    # "--max_tokens", str(max_tokens)      # ❌ Not available
]

🌍 Environment Details

Models tested: VILA1.5-3b, NVILA-15B, VILA1.5-40b, Llama-3-VILA1.5-8B
Input types: Images (.jpg, .png), Videos (.mp4, .webm)
Conversation modes: vicuna_v1, auto
Integration: FastAPI server wrapper for OpenAI-compatible API

Jun 17 '25 09:06 cravirajan