ipex-llm
ipex-llm copied to clipboard
[Max1100/bigdl-llm]Met OOM easily when running llama2-7b/Mistral-7B-v0.1 int4/fp8 multi-batch
When running llama2-7b/Mistral-7B-v0.1 int4/fp8 multi-batch with Max1100, we easily met OOM issue it looks like that when we enable multi-batch, if we run the model mutli-iters, the memory keep increase for each iters HW: Max1100 OS: Ubuntu 22.04 SW: oneAPI 2024.0/bigdl-llm 2.5.0b20240118 based on torch 2.1 GPU driver: https://dgpu-docs.intel.com/releases/stable_775_20_20231219.html How to reproduce:
- create conda env and install bigdl via "pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu"
- run the attached run.sh on Max1100 and monitor the GPU memory via " sudo xpu-smi dump -m 0,1,2,3,4,5,18"
- The GPU memory will increase per each iter and we will meet OOM after multi-iters
We failed to reproduce this problem on our machine (max1100)
Environments:
bigdl's version: 2.5.0b20240118
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1100 OpenCL 3.0 NEO [23.30.26918.50]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1100 1.3 [1.3.26918]
Here is the log:
- INFO - intel_extension_for_pytorch auto imported
loading model...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 13.34it/s]
- INFO - Converting the current model to sym_int4 format......
LlamaAttention(
(q_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
(k_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
(v_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
(o_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
warming up for 10 iterations...
finished warmup
prefill (512 tokens x 8 batch) + generation (512 tokens x 8 batch):
0
iter 1: xx sec total
1
iter 2: xx sec total
2
iter 3: xx sec total
3
iter 4: xx sec total
4
iter 5: xx sec total
5
iter 6: xx sec total
6
iter 7: xx sec total
7
iter 8: xx sec total
8
iter 9: xx sec total
9
iter 10: xx sec total
10
iter 11: xx sec total
11
Here is the GPU mem stats:
Timestamp, DeviceId, GPU Utilization (%), GPU Power (W), GPU Frequency (MHz), GPU Core Temperature (Celsius Degree), GPU Memory Temperature (Celsius Degree), GPU Memory Utilization (%), GPU Memory Used (MiB)
01:02:55.000, 0, 99.76, 196.87, 1550, N/A, N/A, 43.91, 21579.29
01:02:56.000, 0, 99.81, 196.76, 1550, N/A, N/A, 43.91, 21579.29
01:02:57.000, 0, 99.82, 197.18, 1550, N/A, N/A, 43.91, 21579.29
01:02:58.000, 0, 99.85, 197.55, 1550, N/A, N/A, 43.91, 21579.29
01:02:59.000, 0, 89.60, 184.65, 0, N/A, N/A, 43.91, 21579.29
01:03:00.000, 0, 0.00, 27.95, 0, N/A, N/A, 43.91, 21579.29
01:03:01.000, 0, 0.00, 27.88, 0, N/A, N/A, 43.91, 21579.29
01:03:02.000, 0, 0.00, 27.85, 0, N/A, N/A, 43.91, 21579.29
01:03:03.000, 0, 0.00, 27.78, 0, N/A, N/A, 43.91, 21579.29
01:03:04.000, 0, 9.05, 51.09, 1550, N/A, N/A, 46.94, 23066.18
01:03:05.000, 0, 99.33, 209.28, 1550, N/A, N/A, 46.94, 23066.18
01:03:06.000, 0, 99.67, 191.56, 1550, N/A, N/A, 46.94, 23066.18
01:03:07.000, 0, 99.77, 192.01, 1550, N/A, N/A, 46.94, 23066.18
01:03:08.000, 0, 99.82, 193.04, 1550, N/A, N/A, 46.94, 23066.18
01:03:09.000, 0, 99.78, 192.70, 1550, N/A, N/A, 46.94, 23066.18
01:03:10.000, 0, 99.82, 192.79, 1550, N/A, N/A, 46.94, 23066.18
01:03:11.000, 0, 99.82, 192.93, 1550, N/A, N/A, 46.94, 23066.18
01:03:12.000, 0, 99.82, 193.61, 1550, N/A, N/A, 46.94, 23066.18
01:03:13.000, 0, 99.79, 193.89, 1550, N/A, N/A, 46.94, 23066.18
01:03:14.000, 0, 99.69, 194.24, 1550, N/A, N/A, 46.94, 23066.18
01:03:15.000, 0, 99.51, 194.23, 1550, N/A, N/A, 43.91, 21579.28
01:03:16.000, 0, 99.55, 195.14, 1550, N/A, N/A, 43.91, 21579.28
01:03:17.000, 0, 99.55, 195.87, 1550, N/A, N/A, 43.91, 21579.28
01:03:18.000, 0, 99.54, 195.74, 1550, N/A, N/A, 43.91, 21579.28
01:03:19.000, 0, 99.74, 196.17, 1550, N/A, N/A, 43.91, 21579.28
01:03:20.000, 0, 99.71, 196.35, 1550, N/A, N/A, 43.91, 21579.28
01:03:21.000, 0, 99.82, 197.02, 1550, N/A, N/A, 43.91, 21579.28
01:03:22.000, 0, 99.82, 197.39, 1550, N/A, N/A, 43.91, 21579.28
01:03:23.000, 0, 99.83, 197.36, 1550, N/A, N/A, 43.91, 21579.28
01:03:24.000, 0, 99.85, 197.49, 1550, N/A, N/A, 43.91, 21579.28
01:03:25.000, 0, 40.15, 109.33, 0, N/A, N/A, 43.91, 21579.28
01:03:26.000, 0, 4.43, 46.86, 0, N/A, N/A, 43.91, 21579.28
01:03:27.000, 0, 0.00, 27.86, 0, N/A, N/A, 43.91, 21579.28
01:03:28.000, 0, 0.00, 27.75, 0, N/A, N/A, 43.91, 21579.28
01:03:29.000, 0, 0.00, 27.72, 0, N/A, N/A, 43.91, 21579.28
01:03:30.000, 0, 58.30, 191.29, 1550, N/A, N/A, 46.94, 23066.18
01:03:31.000, 0, 99.38, 191.68, 1550, N/A, N/A, 46.94, 23066.18
01:03:32.000, 0, 99.57, 191.20, 1550, N/A, N/A, 46.94, 23066.18
01:03:33.000, 0, 99.68, 191.98, 1550, N/A, N/A, 46.94, 23066.18
01:03:34.000, 0, 99.81, 192.40, 1550, N/A, N/A, 46.94, 23066.18
01:03:35.000, 0, 99.82, 192.78, 1550, N/A, N/A, 46.94, 23066.18
01:03:36.000, 0, 99.81, 193.62, 1550, N/A, N/A, 46.94, 23066.18
01:03:37.000, 0, 99.82, 193.19, 1550, N/A, N/A, 46.94, 23066.18
01:03:38.000, 0, 99.81, 193.47, 1550, N/A, N/A, 46.94, 23066.18
01:03:39.000, 0, 99.80, 193.92, 1550, N/A, N/A, 46.94, 23066.18
01:03:40.000, 0, 98.81, 194.58, 1550, N/A, N/A, 43.91, 21579.29
01:03:41.000, 0, 99.59, 195.55, 1550, N/A, N/A, 43.91, 21579.29
01:03:42.000, 0, 99.60, 196.16, 1550, N/A, N/A, 43.91, 21579.29
01:03:43.000, 0, 99.73, 196.77, 1550, N/A, N/A, 43.91, 21579.29
01:03:44.000, 0, 99.75, 196.72, 1550, N/A, N/A, 43.91, 21579.29
01:03:45.000, 0, 99.77, 196.74, 1550, N/A, N/A, 43.91, 21579.29
01:03:46.000, 0, 99.80, 197.54, 1550, N/A, N/A, 43.91, 21579.29
01:03:47.000, 0, 99.75, 197.94, 1550, N/A, N/A, 43.91, 21579.29
01:03:48.000, 0, 99.84, 197.89, 1550, N/A, N/A, 43.91, 21579.29
01:03:49.000, 0, 99.82, 197.96, 1550, N/A, N/A, 43.91, 21579.29
01:03:50.000, 0, 99.83, 198.46, 1550, N/A, N/A, 43.91, 21579.29
01:03:51.000, 0, 1.77, 45.10, 0, N/A, N/A, 43.91, 21579.29
01:03:52.000, 0, 0.00, 27.82, 0, N/A, N/A, 43.91, 21579.29