FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Add max-model-len argument to vllm worker

Open aliasaria opened this issue 1 year ago • 0 comments
trafficstars

This is needed to load Llama 3.1-8b on an RTX 3090 Otherwise we run out of memory

Why are these changes needed?

Related issue number (if applicable)

Checks

  • [x] I've run format.sh to lint the changes in this PR.
  • [x] I've included any doc changes needed.
  • [x] I've made sure the relevant tests are passing (if applicable).

aliasaria avatar Jul 25 '24 15:07 aliasaria