FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Add `--max-model-len` param and description to `serve.vllm_worker`?

Open Lanture1064 opened this issue 11 months ago • 0 comments

Currently the --max-model-len can be passed to vllm through kwarg, but could it be added to the default param like gpu-utilization-limit? It is often needed when using models that can accept long contexts.

Lanture1064 avatar Mar 29 '24 03:03 Lanture1064