Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

ERROR: Failed building wheel for tokenizers

Github did not provide an action runner at the time for M1, so builds where manual (and infrequent). Any reason you cannot upgrade to `0.13.2` or `0.12.6` ? But yes...

ERROR: Failed building wheel for tokenizers

Hmm interesting, could you try force installing 0.12.6 and see if that fixes it ? If you could share your env (Python version + hardware (m1 I guess) + requirements.txt)...

ERROR: Failed building wheel for tokenizers

I got confused with 0.11.6 sorry ! And I don't see the builds for 0.12 for arm, I'm guessing we moved to 0.13 first. TBH there "shouldn't" by any major...

Safetensors support

You're right, it's not that important. /s Just because you haven't been affected (to your knowledge) doesn't mean it's not real. We have been receiving reports of actual attacks though,...

Unable to deploy lmsys/vicuna-13b-v1.5-16k

@monuminu Yes you need to adjust all parameters so that the requests can fit the extra VRAM left after the model is loaded.

Qwen-7B support

> fairly similar to llama Seems exactly the same on first glance, just fork it and make it look like llama maybe ?

Odd CUDA OOM

The Warmup phase ( the one crashing) is trying to allocate the MAXIMUM possible request mimicking your server under load. > text_generation_launcher: Method Warmup encountered an error. We try to...

Odd CUDA OOM

Yes, in general though PyTorch will allocate memory however it likes so reports by `nvidia-smi` might not really reflect whatever is actually necessary.

Odd CUDA OOM

0.9.3 had issues, because we were using AyncMalloc, and it seems PyTorch doesn´t do a great job at tracking those allocations leading to all sorts of issues everywhere, we did...

Support exllamav2 (exl2) quantized models models

> There are lots of models on HF which are only offered in either F16 of exl2 format Could you point to some ? Exl2 is definitely on our todo...