Nicolas Patry
Nicolas Patry
Github did not provide an action runner at the time for M1, so builds where manual (and infrequent). Any reason you cannot upgrade to `0.13.2` or `0.12.6` ? But yes...
Hmm interesting, could you try force installing 0.12.6 and see if that fixes it ? If you could share your env (Python version + hardware (m1 I guess) + requirements.txt)...
I got confused with 0.11.6 sorry ! And I don't see the builds for 0.12 for arm, I'm guessing we moved to 0.13 first. TBH there "shouldn't" by any major...
You're right, it's not that important. /s Just because you haven't been affected (to your knowledge) doesn't mean it's not real. We have been receiving reports of actual attacks though,...
@monuminu Yes you need to adjust all parameters so that the requests can fit the extra VRAM left after the model is loaded.
> fairly similar to llama Seems exactly the same on first glance, just fork it and make it look like llama maybe ?
The Warmup phase ( the one crashing) is trying to allocate the MAXIMUM possible request mimicking your server under load. > text_generation_launcher: Method Warmup encountered an error. We try to...
Yes, in general though PyTorch will allocate memory however it likes so reports by `nvidia-smi` might not really reflect whatever is actually necessary.
0.9.3 had issues, because we were using AyncMalloc, and it seems PyTorch doesn´t do a great job at tracking those allocations leading to all sorts of issues everywhere, we did...
> There are lots of models on HF which are only offered in either F16 of exl2 format Could you point to some ? Exl2 is definitely on our todo...