Bendr Radrigues

Results 30 comments of Bendr Radrigues

Thanks @enyst! I haven't found a way with litellm to shape outgoing call rates (maybe there is a way to make it act smartly on code 429 / 529?), so...

Apologies for lag @krism142, i havent gone this route to set rate-limits, but according to https://docs.all-hands.dev/modules/usage/llms one could set them using env variables which can be passed when starting container...

Thank you for quick reply! I have tried running the script locally - this was where i stumbled, it said it needs git LFS etc - it seems it would...

@borzunov, thank you, really appreciated! for what its worth, i have tried to update the convert_model.py script to work locally and it produced a bunch of files in the target...

Thank you, indeed 0.5.3 does not have this issue anymore!

Here is where it seems to crash... $ g++ -I. -I./examples -g -std=c++11 -fPIC -pthread quantize.cpp ggml.o utils.o -o quantize $ gdb --args ./quantize ./models/bloom/ggml-model-bloom-f16.bin ./models/bloom/ggml-model-bloomz-f16-q4_0.bin 2 Reading symbols from...

Quantization for 176B works with this commit https://github.com/barsuna/bloomz.cpp/commit/2d0e478c653d078554af0188c90c7081ff0b3059 Inference is also working

./main -m models/bloom/ggml-model-bloom-f16-q4_0.bin -t 96 -p "The most beautiful question is" -n 20 main: seed = 1680447842 bloom_model_load: loading model from 'models/bloom/ggml-model-bloom-f16-q4_0.bin' - please wait ... bloom_model_load: n_vocab = 250880...