Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

Support for 4bit quantization

https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/utils/layers.py#L133-L164 All the code is there indeed.

Support for 4bit quantization

No. bitsandbytes is slow because it does more computation afaik.

Using a model of type RefinedWeb to instantiate a model of type .

EETQ is missing from the docker image, my bad on this: https://github.com/huggingface/text-generation-inference/pull/1081

"TypeError: Descriptors cannot not be created directly"

The error is in protobuf version, the model you linked doesn't use a fast tokenizer (which is needed for additional checks in `text-generation-inference`) and the script fails during the conversion...

"TypeError: Descriptors cannot not be created directly"

1. `pip install protobuf == 3.19` 2. Check for `tokenizer.json` in the repo, that's the file used by fast tokenizers. Usually we can create a fast from a slow, but...

Falcon 40B slow inference

Wait for this to land: https://github.com/huggingface/text-generation-inference/pull/438 so you can use a better latency kernel (GPTQ)

Falcon 40B slow inference

GPTQ are as fast as not quantized versions. I never ran bitsandbytes, so I have no clue, but iirc multiple times slower (~4x maybe ?) .

Falcon 40B slow inference

> I only need to replace --quantize "gptq" instead of --quantize "bitsandbytes". Correct? Or do I also need to replace the docker image? Well you would need the newest docker...

Deploying Falcon to SageMaker TGI DLC after QLoRA fine-tuning

Hey I don't know for sure. The most obvious way would be to "write" the lora directly in your model, creating an entirely new lora free model. Not sure if/how...

Deploying Falcon to SageMaker TGI DLC after QLoRA fine-tuning

We're going to do that automatically for you soon: https://github.com/huggingface/text-generation-inference/pull/762 In the meantime: https://github.com/huggingface/text-generation-inference/issues/482#issuecomment-1602174068 Closing this in favor of #482