dingbaorong comments

Results 4 comments of


                                            dingbaorong

qwen1.8B GPU memory usage is too high

We used https://qianwen-res.oss-cn-beijing.aliyuncs.com/profile.py to reproduce the results. Here are the results on nvidia's gpu: the memory usage reported by `torch.cuda.max_memory_allocated` matches the official report. But the memory usage reported by...

latency and OOM issue when testing Qwen-14B-Chat-INT4 with NUC (12G i7 + a770m 16G) on ubuntu 22.04.3

1. The performance of Qwen-14B-Chat on our machines is good. Here is our configuration: **Machine**: i9 14900K; arc A770; 64 GB mem DDR5 (Linux) **bigdl's version**: 2.5.0b20231213 **Kernel version**: 5.19.0-41-generic...

latency and OOM issue when testing Qwen-14B-Chat-INT4 with NUC (12G i7 + a770m 16G) on ubuntu 22.04.3

Here is how to downgrade linux kernel. https://github.com/intel-analytics/bigdl-llm-tutorial/blob/main/ch_6_GPU_Acceleration/environment_setup.md

[Max1100/bigdl-llm]Met OOM easily when running llama2-7b/Mistral-7B-v0.1 int4/fp8 multi-batch

We failed to reproduce this problem on our machine (max1100) Environments: ``` bigdl's version: 2.5.0b20240118 [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1100 OpenCL 3.0 NEO [23.30.26918.50] [ext_oneapi_level_zero:gpu:0]...