SpQR
SpQR copied to clipboard
WIP: Optimize benchmark load speeds and dequantization
The purpose of this PR is to track the following features:
- Optimize 3-bit dequanzation.
- Add support for multi-batch inference.
- Add support for efficient spqr-to-dense matrix dequantization.
- Add support for running per-tensor benchmarks directly through C++.