FastChat
FastChat copied to clipboard
Add `--max-model-len` param and description to `serve.vllm_worker`?
Currently the --max-model-len
can be passed to vllm through kwarg
, but could it be added to the default param like gpu-utilization-limit
? It is often needed when using models that can accept long contexts.