bitsandbytes
bitsandbytes copied to clipboard
ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`
System Info
Packages installed with CUDA 11.8:
torch - 2.3.0+cu118
llama-index - 0.10.37
llama-index-llms-huggingface - 0.2.0
transformers - 4.39.0
accelerate - 0.27.0
bitsandbytes - 0.43.1
Reproduction
import torch from llama_index.llms.huggingface import HuggingFaceLLM
Optional quantization to 4bit
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, )
llm = HuggingFaceLLM( model_name="meta-llama/Meta-Llama-3-8B-Instruct", model_kwargs={ "token": hf_token, "torch_dtype": torch.bfloat16, # comment this line and uncomment below to use 4bit # "quantization_config": quantization_config }, generate_kwargs={ "do_sample": True, "temperature": 0.6, "top_p": 0.9, }, tokenizer_name="meta-llama/Meta-Llama-3-8B-Instruct", tokenizer_kwargs={"token": hf_token}, stopping_ids=stopping_ids, )
Expected behavior
ImportError: Using bitsandbytes
8-bit quantization requires Accelerate: pip install accelerate
and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes