aphrodite-engine icon indicating copy to clipboard operation
aphrodite-engine copied to clipboard

[Feature]: tensor parallelism support for bnb quantization (via IBM's fork)

Open BlairSadewitz opened this issue 4 months ago • 3 comments

🚀 The feature, motivation and pitch

I don't know if it's feasible or worthwhile to merge this, as maybe the trees are too divergent, etc., but cherry-picking commits for projects I don't fully understand is somehow a pastime for me, so ...

Alternatives

I could always use one of the other 8.4234234*10^23 quantization methods, but, hey, variety is the spice of life--or something.

Additional context

It doesn't work for pre-quantized models. 🎉~

BlairSadewitz avatar Sep 28 '24 16:09 BlairSadewitz