Bendr Radrigues comments

Results 30 comments of


                                            Bendr Radrigues

(feat) Configure fallback llm's in case of rate limit errors

Thanks @enyst! I haven't found a way with litellm to shape outgoing call rates (maybe there is a way to make it act smartly on code 429 / 529?), so...

(feat) Configure fallback llm's in case of rate limit errors

got it now, thanks @enyst

(feat) Configure fallback llm's in case of rate limit errors

Apologies for lag @krism142, i havent gone this route to set rate-limits, but according to https://docs.all-hands.dev/modules/usage/llms one could set them using env variables which can be passed when starting container...

Running bloom-chat on petals

Thank you for quick reply! I have tried running the script locally - this was where i stumbled, it said it needs git LFS etc - it seems it would...

Running bloom-chat on petals

@borzunov, thank you, really appreciated! for what its worth, i have tried to update the convert_model.py script to work locally and it produced a bunch of files in the target...

Support a user-serviceable way to configure http/https proxy in the sandbox

Thank you very much!

[Bug]: passing LLM token limits via env does not work

Thank you, indeed 0.5.3 does not have this issue anymore!

Quantization doesn't work with Bloomz 176B

Here is where it seems to crash... $ g++ -I. -I./examples -g -std=c++11 -fPIC -pthread quantize.cpp ggml.o utils.o -o quantize $ gdb --args ./quantize ./models/bloom/ggml-model-bloom-f16.bin ./models/bloom/ggml-model-bloomz-f16-q4_0.bin 2 Reading symbols from...

Quantization doesn't work with Bloomz 176B

Quantization for 176B works with this commit https://github.com/barsuna/bloomz.cpp/commit/2d0e478c653d078554af0188c90c7081ff0b3059 Inference is also working

Bloomz 176B inference doesn't work

./main -m models/bloom/ggml-model-bloom-f16-q4_0.bin -t 96 -p "The most beautiful question is" -n 20 main: seed = 1680447842 bloom_model_load: loading model from 'models/bloom/ggml-model-bloom-f16-q4_0.bin' - please wait ... bloom_model_load: n_vocab = 250880...