petals How to get faster inference?

How to get faster inference?

Open ryanshrott opened this issue 11 months ago • 1 comments

I added my RTX 3080 to swarm using: conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia pip install git+https://github.com/bigscience-workshop/petals python -m petals.cli.run_server enoch/llama-65b-hf --adapters timdettmers/guanaco-65b

But I still find my inference to be pretty slow, like 2 mins for 200 tokens. Why is that?

Jul 23 '23 21:07 ryanshrott

petals petals copied to clipboard

How to get faster inference?

petals
petals copied to clipboard