SpQR icon indicating copy to clipboard operation
SpQR copied to clipboard

WIP: Optimize benchmark load speeds and dequantization

Open elvircrn opened this issue 1 year ago • 0 comments

The purpose of this PR is to track the following features:

  • Optimize 3-bit dequanzation.
  • Add support for multi-batch inference.
  • Add support for efficient spqr-to-dense matrix dequantization.
  • Add support for running per-tensor benchmarks directly through C++.

elvircrn avatar Dec 22 '24 13:12 elvircrn