openvino icon indicating copy to clipboard operation
openvino copied to clipboard

[GPU] Recognize parameters as valid inputs for compressed weights

Open mdvoretc-intel opened this issue 3 months ago • 11 comments

Details:

  • The change allows parameters to be recognized alongside constants as valid weight inputs for transformations producing FullyConnectedCompressed nodes

Description of the issue:

At present, the FC_COMPRESSED_WEIGHT_PATTERN macro contains a pattern for dequantization of a constant integer weight. This pattern is used to recognize and fold cases where fused weight dequantization can be used, replacing them with FullyConnectedCompressed nodes. Due to expecting a constant weight input, this pattern fails to recognize quantized LoRA weights, which are provided as parameters: fc_compressed_param_before With the changes in this patch, these weights can be recognized, and the transformations can proceed and produce nodes that would then leverage oneDNN fused QGEMM for execution: fc_compressed_param_after

Tickets:

mdvoretc-intel avatar Oct 02 '25 12:10 mdvoretc-intel

build_jenkins

mdvoretc-intel avatar Oct 28 '25 16:10 mdvoretc-intel

build_jenkins

mdvoretc-intel avatar Oct 29 '25 11:10 mdvoretc-intel

@CuriousPanCake please review.

mdvoretc-intel avatar Nov 03 '25 16:11 mdvoretc-intel

@CuriousPanCake please review.

mdvoretc-intel avatar Nov 11 '25 14:11 mdvoretc-intel

@Lyamin-Roman please take a look

CuriousPanCake avatar Nov 14 '25 09:11 CuriousPanCake

Optimized kernels are called for the matrix multiplication itself, but there is a perf cost from transpose nodes for the parameter weights that cannot be optimized away.

mdvoretc-intel avatar Nov 14 '25 11:11 mdvoretc-intel

Consider add new tests

Tests added.

mdvoretc-intel avatar Nov 28 '25 14:11 mdvoretc-intel

@Lyamin-Roman please review.

mdvoretc-intel avatar Nov 28 '25 14:11 mdvoretc-intel

build_jenkins

p-durandin avatar Dec 03 '25 05:12 p-durandin

build_jenkins

susbhere avatar Dec 06 '25 06:12 susbhere

build_jenkins

p-durandin avatar Dec 08 '25 05:12 p-durandin