Bendr Radrigues comments

Results 21 comments of


                                            Bendr Radrigues

How to run h2oGPT Falcon 40B on 1/2/4 older/consumer GPUs using quantization - 3090/3080/4090/4080/V100

Trying this, i get an error that V100 CC is sub 7.5 which is insufficient... In case of 16 bit weights, i'm not even sure why it is needed? NotImplementedError:...

How to run h2oGPT Falcon 40B on 1/2/4 older/consumer GPUs using quantization - 3090/3080/4090/4080/V100

Thanks Arno, that works. And indeed it is visibly faster than the 4-bit quantized flavor running on 1 v100!

How to run h2oGPT Falcon 40B on 1/2/4 older/consumer GPUs using quantization - 3090/3080/4090/4080/V100

Theoretically 48GB between the 2x 3090 should be enough to load 40B model (even maybe in 8 bit), but i'm not sure if this is expected to work out of...

[Bug]: OBSERVATION sudo: unable to resolve host opendevin_sandbox: Temporary failure in name resolution

I dont see the same issue (tried also 0.4.0 with llama3-70b on local ollama with your prompt. For me opendevin gets into trouble at another step, but i agree we...

[Bug]: OBSERVATION sudo: unable to resolve host opendevin_sandbox: Temporary failure in name resolution

thanks @all for your suggestions - will try to make some more testing with these additional tips. @enyst inside containers (0.4.0) i see no poetry, this is the version of...

[Bug]: OBSERVATION sudo: unable to resolve host opendevin_sandbox: Temporary failure in name resolution

Thanks @gbenaa i dont see the json error in 0.5.2 anymore. The model does indeed at times produce empty output... something to debug on the llm side - not an...

Inference issues on Volta-based swarm

Thank you for your reply @justheuristic, this helps! I'm not concerned about tests, just thought to use them to see whats wrong with my setup. Knowing this is generally not...

Inference issues on Volta-based swarm

i have commented out --torch_dtype float16 --compression $COMPRESSION when starting swarm and workers...and the issue is gone. Not sure if it was complete restart of everything, or change of dtype...

Inference issues on Volta-based swarm

Thanks @justheuristic! Regarding compression it was set to 'NONE', since this is all on 1 server. I'll try to maybe add a check if inputs exceeding +-6.5e4 Performance-wise i'm getting...

(feat) Configure fallback llm's in case of rate limit errors

I've hit this also recently with Anthropic API. They have requests/minute, tokens/minute, requests/day limits and OpenDevin quickly (within minute) hit tokens/minute. Since in this case, the rate is known perhaps...