Bendr Radrigues

Results 21 comments of Bendr Radrigues

Trying this, i get an error that V100 CC is sub 7.5 which is insufficient... In case of 16 bit weights, i'm not even sure why it is needed? NotImplementedError:...

Thanks Arno, that works. And indeed it is visibly faster than the 4-bit quantized flavor running on 1 v100!

Theoretically 48GB between the 2x 3090 should be enough to load 40B model (even maybe in 8 bit), but i'm not sure if this is expected to work out of...

I dont see the same issue (tried also 0.4.0 with llama3-70b on local ollama with your prompt. For me opendevin gets into trouble at another step, but i agree we...

thanks @all for your suggestions - will try to make some more testing with these additional tips. @enyst inside containers (0.4.0) i see no poetry, this is the version of...

Thanks @gbenaa i dont see the json error in 0.5.2 anymore. The model does indeed at times produce empty output... something to debug on the llm side - not an...

Thank you for your reply @justheuristic, this helps! I'm not concerned about tests, just thought to use them to see whats wrong with my setup. Knowing this is generally not...

i have commented out --torch_dtype float16 --compression $COMPRESSION when starting swarm and workers...and the issue is gone. Not sure if it was complete restart of everything, or change of dtype...

Thanks @justheuristic! Regarding compression it was set to 'NONE', since this is all on 1 server. I'll try to maybe add a check if inputs exceeding +-6.5e4 Performance-wise i'm getting...

I've hit this also recently with Anthropic API. They have requests/minute, tokens/minute, requests/day limits and OpenDevin quickly (within minute) hit tokens/minute. Since in this case, the rate is known perhaps...