neural-speed icon indicating copy to clipboard operation
neural-speed copied to clipboard

Int4 dequantize kernel

Open zhewang1-intc opened this issue 7 months ago • 0 comments

Type of Change

feature or bug fix or documentation or others: feature API changed or not: add a new kernel

Description

int4 dequantize kernel with very high bandwidth utilization.

MTL: kernel bandwidth: ~85GB/s reported by VTune, hardware maximum bandwidth: ~85GB/s reported by clpeak, nearly 100% utilization;

ARC 770M: at least 90%+ vram bandwidth utilization.

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

zhewang1-intc avatar Jul 16 '24 06:07 zhewang1-intc