[GPU] Recognize parameters as valid inputs for compressed weights
Details:
- The change allows parameters to be recognized alongside constants as valid weight inputs for transformations producing FullyConnectedCompressed nodes
Description of the issue:
At present, the FC_COMPRESSED_WEIGHT_PATTERN macro contains a pattern for dequantization of a constant integer weight. This pattern is used to recognize and fold cases where fused weight dequantization can be used, replacing them with FullyConnectedCompressed nodes. Due to expecting a constant weight input, this pattern fails to recognize quantized LoRA weights, which are provided as parameters:
With the changes in this patch, these weights can be recognized, and the transformations can proceed and produce nodes that would then leverage oneDNN fused QGEMM for execution:
Tickets:
build_jenkins
build_jenkins
@CuriousPanCake please review.
@CuriousPanCake please review.
@Lyamin-Roman please take a look
Optimized kernels are called for the matrix multiplication itself, but there is a perf cost from transpose nodes for the parameter weights that cannot be optimized away.
Consider add new tests
Tests added.
@Lyamin-Roman please review.
build_jenkins
build_jenkins
build_jenkins