fastertransformer_backend Can I enable streaming on an ensemble model?

Can I enable streaming on an ensemble model?

Open flexwang opened this issue 2 years ago • 3 comments

In the ensemble model example for gpt, can I change the fastertransformer model to a decoupled model and enable streaming on the client side?

Jul 18 '23 05:07 flexwang

Aug 24 '23 11:08 jjjjohnson

Answer is yes

Aug 24 '23 16:08 flexwang2

Looks like only FT backend support stream, however python backend does not.

Aug 31 '23 07:08 jjjjohnson