bloomz.cpp Quantizing and running inference on bloom-176B required some changes

Quantizing and running inference on bloom-176B required some changes

Open barsuna opened this issue 1 year ago • 0 comments

Most issues are due to fact that embedding layer 250880x14336 is too large to fit into signed integer
Above affects the main, quantize, and also ggml code
2nd issue is that main seems to estimate amount of necessary memory on the low side
Above is not fixed, i have just added 5GB for weights and doubled the size of context used for model evaluation Being very far away from proficiency in C++, these changes need to be civilized by someone experienced with ggml and c++

Apr 02 '23 15:04 barsuna