kevinintel comments

Results 62 comments of


                                            kevinintel

trafficstars

feature request: support for TextIterationstreamer of HF

Yes, we will support it recently. After the fature enabled, I will upsate in this issue

multi-batch support

we integrate TGI into NeuralChat: https://github.com/intel/intel-extension-for-transformers/pull/1180/files, but there is no way to combine runtime and TGI now

Load Quantized model

will support it soon

No performance difference observed using HF Tranformers Vs intel-extension-for-transformers Python API/library

10700 has AVX2, LLM Runtime support it. It's not just a tool for generating quantied model, we provide fusion and kenrels for inference. You can check the performance data in...

No performance difference observed using HF Tranformers Vs intel-extension-for-transformers Python API/library

We improve performanc on client cpu, but still can't find a 8167M machine. Close this issue first, you can try the performance now.

Cant even install this thing - error during pip installation

Hi, I will close this issue if you don't have concerns

Llava Models

It's for multi-model training, but optimization is WIP.

Llava Models

Someone tried low-bits for llava: https://arxiv.org/pdf/2306.00978.pdf and we will try to quantize it.

Llava Models

we can optimize llava in https://github.com/intel/neural-compressor/pull/1797 will add examples

Version conflict when importing AutoModelForCausalLM from intel_extension_for_transformers.transformers.modeling

Close issue first until the users give more details