mixtral-inference icon indicating copy to clipboard operation
mixtral-inference copied to clipboard

GPUs

Open fakerybakery opened this issue 2 years ago • 1 comments

Hi, Great repo! You mentioned you need quite a few A100s. If this model is ~50B parameters and ppl can run Llama 2 70B on 1xA100, why does this take so much compute? Thank you!

fakerybakery avatar Dec 09 '23 01:12 fakerybakery

I've never tried Llama 70B, but this is running in fp16 without any quantization. That might be part of it?

vikhyat avatar Dec 09 '23 02:12 vikhyat