Yanli
Yanli
When running llama2-7b/Mistral-7B-v0.1 int4/fp8 multi-batch with Max1100, we easily met OOM issue it looks like that when we enable multi-batch, if we run the model mutli-iters, the memory keep increase...
When running Deci/DeciLM-7B int4/fp8 multi-batch with Max1100, comparing the result between bigdl-llm=2.5.0b20240124 and bigdl-llm=2.5.0b20240118, single batch latency improved from 12.3ms to 9.6ms for 512/512, but it leads to easily OOM...
Summary: Steps: 1. Install ipex-llm following below steps(20240629 is used): conda create -n ipex_llm python=3.9 source activate ipex_llm conda install -c conda-forge -y libstdcxx-ng=12 conda install -c conda-forge -y gperftools=2.10...