intel-extension-for-transformers icon indicating copy to clipboard operation
intel-extension-for-transformers copied to clipboard

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Results 95 intel-extension-for-transformers issues
Sort by recently updated
recently updated
newest added

Trying to build a project using python3.10-alpine docker image as a base, the project has intel-extension-for-transformers as a dep and I hit this error: ``` 9.175 Collecting intel-extension-for-transformers==1.3.2 9.193 Downloading...

from intel_extension_for_transformers.neural_chat import PipelineConfig from intel_extension_for_transformers.neural_chat import build_chatbot from intel_extension_for_transformers.neural_chat import plugins plugins.retrieval.enable=True plugins.retrieval.args["input_path"]="./docs/" config = PipelineConfig(plugins=plugins) chatbot = build_chatbot(config) When I run this code every time I add some...

aitce

I am trying to explore the backend server. After resolving dependencies issues, I tried to start the server but system doesn’t shows any running backend server neither logs helps out...

aitce

stream default is True but string output is not like streaming ![image](https://github.com/intel/intel-extension-for-transformers/assets/32321821/0660fc3e-61c2-463a-a116-5d11b63a2ec4)

Hey there! I'm trying to run llama3-8b-instruct with intel extension for transformers. Here's my code: ``` from transformers import AutoTokenizer from intel_extension_for_transformers.transformers import AutoModelForCausalLM import torch model_id = "meta-llama/Meta-Llama-3-8B-Instruct" tokenizer...

## Type of Change Added feature to use EAGLE (speculative sampling) with ITREX as discussed with the ITREX team and Haim Barad from my team. Added example script on how...

## Type of Change feature or bug fix or documentation or others API changed or not not ## Description Add streaming llm doc

## Type of Change Bug Fix to use token latency instead of total inference time to measure performance ## Description The workshop notebook measure total inference time for performance instead...

New prompt format for llama3 https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

Adding an end-end finetuning and evaluation workflow for text-generation using Glue MNLI