kevinintel

Results 62 comments of kevinintel
trafficstars

Yes, we will support it recently. After the fature enabled, I will upsate in this issue

we integrate TGI into NeuralChat: https://github.com/intel/intel-extension-for-transformers/pull/1180/files, but there is no way to combine runtime and TGI now

will support it soon

10700 has AVX2, LLM Runtime support it. It's not just a tool for generating quantied model, we provide fusion and kenrels for inference. You can check the performance data in...

We improve performanc on client cpu, but still can't find a 8167M machine. Close this issue first, you can try the performance now.

Hi, I will close this issue if you don't have concerns

It's for multi-model training, but optimization is WIP.

Someone tried low-bits for llava: https://arxiv.org/pdf/2306.00978.pdf and we will try to quantize it.

we can optimize llava in https://github.com/intel/neural-compressor/pull/1797 will add examples