kevinintel
kevinintel
Yes, we will support it recently. After the fature enabled, I will upsate in this issue
we integrate TGI into NeuralChat: https://github.com/intel/intel-extension-for-transformers/pull/1180/files, but there is no way to combine runtime and TGI now
will support it soon
10700 has AVX2, LLM Runtime support it. It's not just a tool for generating quantied model, we provide fusion and kenrels for inference. You can check the performance data in...
We improve performanc on client cpu, but still can't find a 8167M machine. Close this issue first, you can try the performance now.
Hi, I will close this issue if you don't have concerns
It's for multi-model training, but optimization is WIP.
Someone tried low-bits for llava: https://arxiv.org/pdf/2306.00978.pdf and we will try to quantize it.
we can optimize llava in https://github.com/intel/neural-compressor/pull/1797 will add examples
Close issue first until the users give more details