FastChat
FastChat copied to clipboard

Published 20 hours ago •

Reame
Issues

Add max-model-len argument to vllm worker

Open aliasaria opened this issue 1 year ago • 0 comments

trafficstars

This is needed to load Llama 3.1-8b on an RTX 3090 Otherwise we run out of memory

Why are these changes needed?

Related issue number (if applicable)

Checks

[x] I've run format.sh to lint the changes in this PR.
[x] I've included any doc changes needed.
[x] I've made sure the relevant tests are passing (if applicable).

Jul 25 '24 15:07 aliasaria