intel-extension-for-transformers multi-batch support

multi-batch support

Open NaamaVian opened this issue 2 years ago • 2 comments

trafficstars

Hi,

Two questions please:

Do you support multi-prompt batching in any way? I tried via input_ids but the generation failed with "Unsupport multi-batch input-ids": https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/graph/init.py#L137 Is there another way?
Do you plan on integrating with HuggingFace's text-generation-inference?

Thanks

Nov 15 '23 17:11 NaamaVian

Hi, @NaamaKadosh,

About 1, we have supported the static batching (padding left) process of beam search, however, it only supports some models (like gpt-j and gpt-neox). We will support other model-arch and batched top-k, top-p, and greedy search generation ways later (by the end of the year). Will let you know if done.

About 2, we will discuss it, thanks.

Nov 16 '23 03:11 zhentaoyu

we integrate TGI into NeuralChat: https://github.com/intel/intel-extension-for-transformers/pull/1180/files, but there is no way to combine runtime and TGI now

Jan 24 '24 12:01 kevinintel

Hi, @NaamaKadosh, we support the continuous batching mechanism when use_neural_speed, please refer to https://github.com/intel/neural-speed/blob/main/docs/continuous_batching.md for more details and usage. We will add a related ITREX related example soon. If you have no other questions, we will close this issue. Thanks.

Jun 05 '24 07:06 zhentaoyu

intel-extension-for-transformers intel-extension-for-transformers copied to clipboard

multi-batch support

intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard