bloomz.cpp
bloomz.cpp copied to clipboard
Quantizing and running inference on bloom-176B required some changes
- Most issues are due to fact that embedding layer 250880x14336 is too large to fit into signed integer
- Above affects the main, quantize, and also ggml code
- 2nd issue is that main seems to estimate amount of necessary memory on the low side
- Above is not fixed, i have just added 5GB for weights and doubled the size of context used for model evaluation Being very far away from proficiency in C++, these changes need to be civilized by someone experienced with ggml and c++