ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Ma...

Results 608 ipex-llm issues
Sort by recently updated
recently updated
newest added

What are default values of max_generated_tokens, top_k, top_p, and temperature? If user doesn't set all parameters in `generate_kwargs` such as the example below, it should use default values. How do...

user issue

I try to transform a string into input llama2-specific and llama3-specific input in the function `completion_to_prompt()` Is there a way to pass parameter **model_option** as a input? or else, I...

user issue

Logs use 'bigdl-llm' while converting and loading models into q4 binary format, should use `ipex-llm` ``` bigdl-llm: loading model from ./bigdl_llm_llama_q4_0.bin loading bigdl-llm model: format = ggjt v3 (latest) loading...

## Description Gemma shares the same RotaryEmbedding layer with phi3.

I hope to switch llama2-7b-chat and llama3-8b models. But it cost a lot of memory size if I load both. How to clear one if I am going to load...

user issue

## Description Initial patch function for inference ### 2. User API changes Support `llm_patch(train=False, device='xpu', load_in_low_bit='sym_int4')` Only need to add the following code at the beginning to run huggingface inference...

## Description Add continuous-batching-like partial prefilling to reduce the memory peak during prefilling.

## Description support q4_0_rtn

Hi, I saved the LLAVA model in 4bit using generate.py from: https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llava model = optimize_model(model) #Added these lines below in the generate.py if SAVE_PATH : model.save_low_bit(save_path_model) tokenizer.save_pretrained(save_path_model) print(f"Model and tokenizer...

user issue

Below code works when I am using Mixtral model from Ollama directly. But when I use the IPEX-LLM optimized Mixtral model, the tool does not work. This is an easy...

user issue