lorax icon indicating copy to clipboard operation
lorax copied to clipboard

Multi-responses for a single inference

Open mrcchef opened this issue 1 year ago • 0 comments
trafficstars

Feature request

Lorax should have a feature where It could generated multiple generation request for same prompt via a single API call.

Motivation

It is helpful as when some one wants multiple output for a single prompt then instead of calling so many request it can be achieved in a single call. This will also reduce number of calls which would avoid the connection reset by peer failure due to excessive requests made to server.

Currently, I am using lorax to deploy llama 3 and I need multiple outputs for a single requests. And b/c of which alot of my request gets dropped b/c of connection reset by peer.

Your contribution

.

mrcchef avatar Nov 05 '24 06:11 mrcchef