fastertransformer_backend icon indicating copy to clipboard operation
fastertransformer_backend copied to clipboard

Can I enable streaming on an ensemble model?

Open flexwang opened this issue 1 year ago • 3 comments

In the ensemble model example for gpt, can I change the fastertransformer model to a decoupled model and enable streaming on the client side?

flexwang avatar Jul 18 '23 05:07 flexwang

+1

jjjjohnson avatar Aug 24 '23 11:08 jjjjohnson

Answer is yes

flexwang2 avatar Aug 24 '23 16:08 flexwang2

Looks like only FT backend support stream, however python backend does not.

jjjjohnson avatar Aug 31 '23 07:08 jjjjohnson