ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

support >= 4GB SYCL compute buffer size for longer context length

Open ytliew82 opened this issue 9 months ago • 4 comments

Describe the bug The SYCL Unified Shared Memory (USM) type of device memory has maximum constraint of 4 GB. Ipex-llm will report error if the calculated kv cache size is more than 4GB.

How to reproduce computer setup with igpu only inference and >= 32GB ram, thus expecting no allocation issue with larger context size. encounter this issue with Gemma-3 model

Steps to reproduce the error:

  1. configure the -c argument to smaller count
  2. observe the buffer size reported used for SYCL buffer, safe if less than 4GB
  3. increase the -c argument till expectation is larger than 4GB. Will getting the reported error on memory allocation issue.

Additional context Am running gemma 3 model with llama server, thus expecting similar issue for other moe models

declaring multiple SYCL USM device instances might overcome this constraint, to have more than 4GB buffer size for longer context length (few k and above, and case with parallel enabled)

ytliew82 avatar Mar 31 '25 10:03 ytliew82

Hi ytliew82,

We previously encountered the same error with Gemma-3 4B on ARC, while Gemma-3 12B seemed to work fine. Are you using the 4B model in your test?

cyita avatar Apr 01 '25 08:04 cyita

tested with Gemma-3 4B, 12B, having same error on not fit into device buffer. currently run with cpu only inference as workaround, and limiting the -ngl argument to fit into 4GB device buffer.

anyway, based on my understanding, the USM type of host/device/shared mostly apply for dGPU. https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2025-0/unified-shared-memory-allocations.html#USM-ALLOCATION

since the IGPU shared the L3 cache with CPU, could we try optionally use shared buffer instead of device buffer? if initialize --device IGPU

ytliew82 avatar Apr 01 '25 16:04 ytliew82

Hi ytliew82,

Thank you for the information! We'll provide updates once it's supported.

cyita avatar Apr 02 '25 02:04 cyita

+1

toncao avatar Apr 03 '25 09:04 toncao