fastertransformer_backend
fastertransformer_backend copied to clipboard
Can I enable streaming on an ensemble model?
In the ensemble model example for gpt, can I change the fastertransformer
model to a decoupled
model and enable streaming on the client side?
+1
Answer is yes
Looks like only FT backend support stream, however python backend does not.