FasterTransformer
FasterTransformer copied to clipboard
decoupled model with non-streaming mode
Looks like if I set model as decoupled, I can still query it with non-streaming mode. Is this expected behavior? What is the latency impact here?