Yanli issues

Results 4 issues of


                                            Yanli

[Max1100/bigdl-llm]Met OOM easily when running llama2-7b/Mistral-7B-v0.1 int4/fp8 multi-batch

When running llama2-7b/Mistral-7B-v0.1 int4/fp8 multi-batch with Max1100, we easily met OOM issue it looks like that when we enable multi-batch, if we run the model mutli-iters, the memory keep increase...

user issue

[Max1100/bigdl-llm]Met OOM easily when running Deci/DeciLM-7B int4/fp8 multi-batch(bs=8) based on bigdl-llm=2.5.0b20240124 while it could support up to bs=150 based on bigdl-llm=2.5.0b20240118

When running Deci/DeciLM-7B int4/fp8 multi-batch with Max1100, comparing the result between bigdl-llm=2.5.0b20240124 and bigdl-llm=2.5.0b20240118, single batch latency improved from 12.3ms to 9.6ms for 512/512, but it leads to easily OOM...

user issue

[Flex170/Whisper]Low GPU usage when running whisper via ipex-llm on Flex170 GPU

Summary: Steps: 1. Install ipex-llm following below steps(20240629 is used): conda create -n ipex_llm python=3.9 source activate ipex_llm conda install -c conda-forge -y libstdcxx-ng=12 conda install -c conda-forge -y gperftools=2.10...

user issue

Yanli

[Max1100/bigdl-llm]Met OOM easily when running llama2-7b/Mistral-7B-v0.1 int4/fp8 multi-batch

[Max1100/bigdl-llm]Met OOM easily when running Deci/DeciLM-7B int4/fp8 multi-batch(bs=8) based on bigdl-llm=2.5.0b20240124 while it could support up to bs=150 based on bigdl-llm=2.5.0b20240118

[Flex170/Whisper]Low GPU usage when running whisper via ipex-llm on Flex170 GPU

Add support for llava_mpt