Bendr Radrigues comments

Results 30 comments of


                                            Bendr Radrigues

Bloomz 176B inference doesn't work

Above was produced with this commit https://github.com/barsuna/bloomz.cpp/commit/2d0e478c653d078554af0188c90c7081ff0b3059

llama3 does not return pure json

@Dilip-17 there was same question on another issue, i added some pointers there https://github.com/assafelovic/gpt-researcher/issues/520 the challenge is mostly not how to run, but having the gpu memory necessary to run...

llama3 does not return pure json

To its credit, llama3 worked pretty much out of box with gpt-researcher (the only tweak needed was the prompt change above). It seems it is possible to stretch the context...

Graceful handling of websocket closure

thanks @ElishaKay, indeed the timeout happens during to busy time on the server side (generally during subtopic generation for me). The computer itself is nowhere near overloaded (cpu/mem/io -wise) -...

Graceful handling of websocket closure

idk, there is still a risk of timeout (though a lesser one perhaps?), i wonder if there are ways to control the length of the timeout on the gpt-researcher side....

extract_headers sometimes fails

there was a post here https://github.com/assafelovic/gpt-researcher/issues/395 - use lm-studio for llama3 - for embeddings install ollama with some small model (lm-studio had embeddings too, but different api format) i'm using...

[BOUNTY - $500] Add support for quantized models with tinygrad

(I'm not in any way positioned to implement the quantization support, but wanted to share some notes with those planning to work on it) background: I thought tinygrad example already...

Issues loading model shards

Thank you @AlexCheema! Ack on 1. I realize now these are static numbers. If determining these dynamically, it seems sensible to also establish bus bandwidth and GPU memory bandwidth -...

Issues loading model shards

thank you @AlexCheema ! On 3 - this approch seems limited to 2 processes, we still need something different for when there is >2 instances. I tried to put each...

Support disabling of API response streaming via config

Hey @assafelovic, it is hardly rational i know - it is an 'operating in a broken world' kind of thing. I found some odd issues where APIs wouldn't handle correctly...