FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

decoupled model with non-streaming mode

Open flexwang opened this issue 1 year ago • 0 comments

Looks like if I set model as decoupled, I can still query it with non-streaming mode. Is this expected behavior? What is the latency impact here?

flexwang avatar Aug 10 '23 04:08 flexwang