aphrodite-engine
aphrodite-engine copied to clipboard
[Feature]: tensor parallelism support for bnb quantization (via IBM's fork)
🚀 The feature, motivation and pitch
I don't know if it's feasible or worthwhile to merge this, as maybe the trees are too divergent, etc., but cherry-picking commits for projects I don't fully understand is somehow a pastime for me, so ...
Alternatives
I could always use one of the other 8.4234234*10^23 quantization methods, but, hey, variety is the spice of life--or something.
Additional context
It doesn't work for pre-quantized models. 🎉~