ipex-llm issues

Results 608 ipex-llm issues

Sort by recently updated

default values of max_generated_tokens, top_k, top_p, and temperature?

What are default values of max_generated_tokens, top_k, top_p, and temperature? If user doesn't set all parameters in `generate_kwargs` such as the example below, it should use default values. How do...

JamieVC

user issue

Transform a string into input llama2-specific and llama3-specific input ?

I try to transform a string into input llama2-specific and llama3-specific input in the function `completion_to_prompt()` Is there a way to pass parameter **model_option** as a input? or else, I...

JamieVC

user issue

log using ipex-llm instead of bigdl-llm in while running native models

Logs use 'bigdl-llm' while converting and loading models into q4 binary format, should use `ipex-llm` ``` bigdl-llm: loading model from ./bigdl_llm_llama_q4_0.bin loading bigdl-llm model: format = ggjt v3 (latest) loading...

songhappy

Add Gemma RotaryEmbeddingCached

## Description Gemma shares the same RotaryEmbedding layer with phi3.

leonardozcm

how to switch to load multiple llm models in a streamlit page?

I hope to switch llama2-7b-chat and llama3-8b models. But it cost a lot of memory size if I load both. How to clear one if I am going to load...

JamieVC

user issue

Add initial patch function for inference

## Description Initial patch function for inference ### 2. User API changes Support `llm_patch(train=False, device='xpu', load_in_low_bit='sym_int4')` Only need to add the following code at the beginning to run huggingface inference...

plusbang

[WIP] LLM: Partial Prefilling for Pipeline Parallel Serving

## Description Add continuous-batching-like partial prefilling to reduce the memory peak during prefilling.

xiangyuT

support q4_0_rtn

## Description support q4_0_rtn

leonardozcm

error with ipex-llm langchain integration for LLAVA model

Hi, I saved the LLAVA model in 4bit using generate.py from: https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llava model = optimize_model(model) #Added these lines below in the generate.py if SAVE_PATH : model.save_low_bit(save_path_model) tokenizer.save_pretrained(save_path_model) print(f"Model and tokenizer...

tsantra

user issue

Issue while using IPEX-LLM optimized Mixtral model in Langchain agent creation

Below code works when I am using Mixtral model from Ollama directly. But when I use the IPEX-LLM optimized Mixtral model, the tool does not work. This is an easy...

tsantra

user issue

ipex-llm
ipex-llm copied to clipboard

Metadata

default values of max_generated_tokens, top_k, top_p, and temperature?

Transform a string into input llama2-specific and llama3-specific input ?

log using ipex-llm instead of bigdl-llm in while running native models

Add Gemma RotaryEmbeddingCached

how to switch to load multiple llm models in a streamlit page?

Add initial patch function for inference

[WIP] LLM: Partial Prefilling for Pipeline Parallel Serving

support q4_0_rtn

error with ipex-llm langchain integration for LLAVA model

Issue while using IPEX-LLM optimized Mixtral model in Langchain agent creation

← Metadata

Owner

Metadata

ipex-llm ipex-llm copied to clipboard

Metadata

← Metadata

Owner

Metadata

ipex-llm
ipex-llm copied to clipboard