hivemind
hivemind copied to clipboard
[WIP] Add support for quantization with bitsandbytes
This PR integrates blockwise quantization from bitsandbytes as a new compression mechanism of Hivemind. The important part is that it is an optional compression protocol: the user should only install an external library if they are going to need it, and hence the "conditional import"/"extra dependency" parts.
The code on the Hivemind side is pretty simple, but it'd be cool to have a way to include a CPU-only build of bitsandbytes as a dependency, so that we'll be able to both include it without checking for a CUDA version and to test the integration in GHA. @TimDettmers has granted me access to the bitsandbytes repo, so I'm going to work on that first before making this PR as ready to merge.
How's the PR going? need any help?
Not sure if I need any help, since we're mostly waiting for the new bitsandbytes release
Codecov Report
Merging #490 (f311943) into master (6395e89) will decrease coverage by
0.04%. The diff coverage is89.74%.
@@ Coverage Diff @@
## master #490 +/- ##
==========================================
- Coverage 86.31% 86.27% -0.05%
==========================================
Files 81 81
Lines 7887 7919 +32
==========================================
+ Hits 6808 6832 +24
- Misses 1079 1087 +8
| Impacted Files | Coverage Δ | |
|---|---|---|
| hivemind/compression/quantization.py | 94.59% <87.50%> (-2.88%) |
:arrow_down: |
| hivemind/compression/__init__.py | 100.00% <100.00%> (ø) |
|
| hivemind/compression/serialization.py | 100.00% <100.00%> (ø) |
|
| hivemind/averaging/matchmaking.py | 88.35% <0.00%> (-0.90%) |
:arrow_down: |
| hivemind/averaging/averager.py | 88.27% <0.00%> (-0.24%) |
:arrow_down: |
LGTM, please merge at will