lorax
lorax copied to clipboard
Multi-responses for a single inference
trafficstars
Feature request
Lorax should have a feature where It could generated multiple generation request for same prompt via a single API call.
Motivation
It is helpful as when some one wants multiple output for a single prompt then instead of calling so many request it can be achieved in a single call. This will also reduce number of calls which would avoid the connection reset by peer failure due to excessive requests made to server.
Currently, I am using lorax to deploy llama 3 and I need multiple outputs for a single requests. And b/c of which alot of my request gets dropped b/c of connection reset by peer.
Your contribution
.