bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`

Open AnandUgale opened this issue 9 months ago • 0 comments

System Info

Packages installed with CUDA 11.8:

torch - 2.3.0+cu118
llama-index - 0.10.37
llama-index-llms-huggingface - 0.2.0
transformers - 4.39.0
accelerate - 0.27.0
bitsandbytes - 0.43.1

Reproduction

import torch from llama_index.llms.huggingface import HuggingFaceLLM

Optional quantization to 4bit

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, )

llm = HuggingFaceLLM( model_name="meta-llama/Meta-Llama-3-8B-Instruct", model_kwargs={ "token": hf_token, "torch_dtype": torch.bfloat16, # comment this line and uncomment below to use 4bit # "quantization_config": quantization_config }, generate_kwargs={ "do_sample": True, "temperature": 0.6, "top_p": 0.9, }, tokenizer_name="meta-llama/Meta-Llama-3-8B-Instruct", tokenizer_kwargs={"token": hf_token}, stopping_ids=stopping_ids, )

Expected behavior

ImportError: Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes

AnandUgale avatar May 18 '24 01:05 AnandUgale