bigdl-llm-tutorial icon indicating copy to clipboard operation
bigdl-llm-tutorial copied to clipboard

Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using bigdl-llm

Results 12 bigdl-llm-tutorial issues
Sort by recently updated
recently updated
newest added

I use the code here: https://github.com/intel-analytics/ipex-llm-tutorial/blob/original-bigdl-llm/Chinese_Version/ch_6_GPU_Acceleration/6_1_GPU_Llama2-7B.md But failed. Can you help with this? Thanks. `from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel from transformers import LlamaTokenizer, AutoTokenizer chatglm3_6b = 'D:/AI_projects/Langchain-Chatchat/llm_model/THUDM/chatglm2-6b' model_in_4bit = AutoModel.from_pretrained(pretrained_model_name_or_path=chatglm3_6b,...

My code is based on bigdl-llm `from langchain import LLMChain, PromptTemplate from bigdl.llm.langchain.llms import TransformersLLM from langchain.memory import ConversationBufferWindowMemory chatglm3_6b = 'D:/AI_projects/Langchain-Chatchat/llm_model/THUDM/chatglm3-6b' llm_model_path = chatglm3_6b # huggingface llm 模型的路径 CHATGLM_V3_PROMPT_TEMPLATE...

❯ pip install --pre --upgrade ipex-llm[all] zsh: no matches found: ipex-llm[all]

- The `output_path` in sample python code is missing quotes - `torch` isn't properly imported - `tokenizer` should be loaded before loading dataset

Replace the deprecated demo audio files/voice dataset

Using bigdl-llm in a production environment, Python performance is too poor, can you provide an inference library in C++ and provide an OpenAI-compatible API

We need to update GPU installation (including PyTorch2.1 support, Windows installation) in chapter6 and 7, referring to https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html

![img_v3_0269_c20cbf2c-b81b-4866-914b-d470413adebg](https://github.com/intel-analytics/bigdl-llm-tutorial/assets/74948610/a9c19a72-767f-4089-97a6-5ae4c6fd661e) Each time when I interact with the model, the memory occupied by the model increases and does not release memory resources. As a result, when there are many conversations,...

![image](https://github.com/intel-analytics/bigdl-llm-tutorial/assets/74948610/dba22357-ada1-4c00-8cbf-86b991db37a5) After putting the model and inputs to xpu, the model is work now on intel laptop. But the inference time is about 588 seconds that is too long for...

Update Chapter 5 5_1_ChatBot and 5_1_2_Speech Recognition notebook in the English version and Chinese version