FastChat
FastChat copied to clipboard
Add max-model-len argument to vllm worker
trafficstars
This is needed to load Llama 3.1-8b on an RTX 3090 Otherwise we run out of memory
Why are these changes needed?
Related issue number (if applicable)
Checks
- [x] I've run
format.shto lint the changes in this PR. - [x] I've included any doc changes needed.
- [x] I've made sure the relevant tests are passing (if applicable).