Nicolas Patry
Nicolas Patry
Just created a PR for it. We're going to add `peft` dependency and others which are already depending on PyTorch. This should fix it, however I'll also incorportate your change...
Ok it is merged, could you try on latest (one it finishes uploading ?) https://github.com/huggingface/text-generation-inference/actions/runs/5755083304 Edit: [sha-f91e9d2](https://github.com/orgs/huggingface/packages/container/text-generation-inference/115569670?tag=sha-f91e9d2)
We're running mostly on those.... Do you mind opening a new issue and giving all the details you can provide ?
Sorry no the 11.4 drivers actually have some stability issues regarding BF16/F16 so I'm not sure we want to support them. You should however be able to modify the source...
There are definitely some benefits in doing this. 1- We don' t have to guess how the model processes our string input, also we can override tokenization a produce a...
unstale. We'll see if we can leverage it in our default regular transformers branch, but it won't work with flash attention, nor paged attention, leading to suboptimal performance in the...
Please provide the necessary information.
Sorry, we need the information suggested in the `New issue` prompt. Everything about your environment and what commands you are running. I am closing this for now since it's impossible...
Some information like special tokens semantics is not contained in this library (it has no clue HOW the tokens are used). Have you tried doing something like ```python tokenizer =...
> does this mean the model needs to be remade? Also - the option "bitsandbytes-nf4" and "bitsandbytes-fp4" are not available options. I found "bitsandbytes" and "gptq" to be acceptable options...