Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

0.9.4 docker image using cuda 11.7 instead of cuda 11.8

Just created a PR for it. We're going to add `peft` dependency and others which are already depending on PyTorch. This should fix it, however I'll also incorportate your change...

0.9.4 docker image using cuda 11.7 instead of cuda 11.8

Ok it is merged, could you try on latest (one it finishes uploading ?) https://github.com/huggingface/text-generation-inference/actions/runs/5755083304 Edit: [sha-f91e9d2](https://github.com/orgs/huggingface/packages/container/text-generation-inference/115569670?tag=sha-f91e9d2)

0.9.4 docker image using cuda 11.7 instead of cuda 11.8

We're running mostly on those.... Do you mind opening a new issue and giving all the details you can provide ?

0.9.4 docker image using cuda 11.7 instead of cuda 11.8

Sorry no the 11.4 drivers actually have some stability issues regarding BF16/F16 so I'm not sure we want to support them. You should however be able to modify the source...

Allow client to provide prompt token ids instead of a string

There are definitely some benefits in doing this. 1- We don' t have to guess how the model processes our string input, also we can override tokenization a produce a...

Optimum-NVIDIA

unstale. We'll see if we can leverage it in our default regular transformers branch, but it won't work with flash attention, nor paged attention, leading to suboptimal performance in the...

How can I deactivate Flash Attention?

Please provide the necessary information.

How can I deactivate Flash Attention?

Sorry, we need the information suggested in the `New issue` prompt. Everything about your environment and what commands you are running. I am closing this for now since it's impossible...

PretrainedTokenizerFast from Tokenizer Does Not Keep The Same Properties?

Some information like special tokens semantics is not contained in this library (it has no clue HOW the tokens are used). Have you tried doing something like ```python tokenizer =...

--quantize bitsandbytes or --quantize gptq does not work.

> does this mean the model needs to be remade? Also - the option "bitsandbytes-nf4" and "bitsandbytes-fp4" are not available options. I found "bitsandbytes" and "gptq" to be acceptable options...